




THE RATAN TATA LIBRARY 


Cl.No. bz,‘Bo'/ hi 

Ac. No. iR,q'76 Date of release for loan 

This book should be returned on or before the date last stamped 
below. An overdue charge of one anna will be levied for each day 


the book is kept beyond that date. 






McGRAW-HILL PUBLICATIONS IN SOCIOLOGY 

EDWARD BYRON REUTER. CoNsui/rma Editob 


Elementary Social Statistics 


The quality of ihe material used in the manufacture 
qf Uiis book is governed by continued postwar shortages. 



McGRAW-HILL PUBLICATIONS IN SOCIOLOGY 

EDWARX) BYRON REUTER, Consulting Editor 


Angell —The Integration of American Society 
Baber —^Marriage and the Family 
Bowman —Marriage for Moderns 
Brown —Social Psychology 

Cook -CJOMMUNITY BACKGROUNDS OF EDUCATION 

Paris —The Nature of Human Nature 
Hagerty —^The Training of Social Workers 
Haynes —Criminology 
Haynes —The American Prison System 
Hertzler —Social Institutions 

Hertzier —^The Social Thought of the Ancient Civilizations 
Holmes —Rural Sociology 
House —^The Development of Sociology 
Karpf —^American Social Psychology 
LaPiere —Collective Behavior 
Rural Life in Process 
Lumley —Principles of Sociology 
McCormick —Elementary Social Statistics 
Mead —Cooperation and Competition among Primitive Peoples 
North —The Community and Social Welfare 
North —Social Problems and Social Planning 
Parsons —^The Structure of Social Action 
Queen and Thomas —^The City 
Rodin —^SociAL Anthropology 
Reckless —Criminal Behavior 
Reckless and Smith —Juvenile Delinquency 
Reuter and Hart —Introduction to Sociology 
Reuter and Runner —^The Family 
Street —^The Public Welfare Administrator 
Thomas —Primitive Behavior 
Thompson —Population Problems 
Young —Interviewing in Social Work 
Young —Social Treatment in Probation and Delinquency 



ELEMENTARY 

SOCIAL 

STATISTICS 


By Thomas Carson McCormick 

Professor of Sociology,University of Wisconsin 


FIRST EDITION t 
FOURTH IMPRESSION 


-<r a— — ■ 

' McGRAW-HILL BOOK COMPANY, Inc. 

NEW YORK AND LONDON 
1941 




ELEMENTARY SOCIAL STATISTICS 

Copyright, 1941 , by the 
McGraw-Hill Book Company, Inc. 

PRINTED IN THB UNITED STATES OP AMERICA 

All rights reserved. This booky or 
parts thereof y may not be reproduced 
in any form without permission of 
the publishers. 


THE MAPLE PRESS COMPANY, YORK, PA. 



To My Wife 

Lillie Griffith McCormick 




Preface 

This beginning textbook in statistical methods has been written 
to meet the needs of undergraduate college students who are 
concentrating in sociology and related subjects. In the choice 
of methods, in the character of the illustrative data and problems, 
and in emphasis throughout, it differs from the texts in eco¬ 
nomic or educational statistics that have generally been used 
by such students. 

The chief purpose has been to provide students who expect to 
become professional sociologists with the necessary groundwork 
for more advanced training in quantitative research methods. 
Familiarity with the topics included, however, should enable 
those who take no further courses in statistics to understand 
most of the statistical studies and references that now appear 
in the sociological journals and literature. Nonprofessional 
students who go through the course should learn to appreciate 
some of the difficulties involved in the study of social problems, 
and to be more wary of careless and prejudiced thinking in this 
field; for mathematical statistics represents a rigorous form of 
applied logic. 

Unfortunately, most students wffio elect to specialize in soci¬ 
ology have no mathematical training beyond high school algebra. 
This fact has compelled the omission of mathematical deriva¬ 
tions, with the exception of a few very simple ones. As a 
substitute, an attempt has been made to point out assumptions 
that should be watched in using the various formulas. Students 
who plan to go on in the subject, however, should begin at once 
to build up an adequate mathematical background. ^ 

Because of its complications and as yet very infrequent use in 
sociological research, small-sampling theory has for the most part 
been omitted from this elementary treatment. 

The amount of material covered is more than enough for a 
semester's work with an average class, so that some selection of 
topics is possible for the instructor. Under certain circum- 


Vll 



viii 


PREFACE 


stances, it may be advisable to omit the less easy sections of 
chapters IX, XI, XII, XIII, and XIV. 

Constant practice in working statistical problems is indispensa¬ 
ble for mastery of the subject. The problems given at the end 
of each chapter are intended to be only suggestive; they should 
be greatly multiplied for laboratory purposes. 

Thanks are due Professor E. A. Gaumnitz of the University 
of Wisconsin, who has read the manuscript and made helpful 
suggestions, and Mr. Robert J. Hader, who has eliminated 
numerous minor errors. 

Special acknowledgment is made of permission by Prof. R. A. 
Fisher and his publishers, Oliver & Boyd, Edinburgh, to use the 
Table of Chi-square and the Table of Values of the Correlation 
Coefficient for Different Levels of Significance, which appear as 
Tables 2 and 4 in the Appendix of this book. Many other 
publishers have been kind enough to grant permission to use 
tables and material, specific acknowledgment of which has been 
made in place. 

Thomas C. McCormick. 

Madison, Wis., 

AxLguBt^ 1941 . 



Contents 


Paos 

Preface .vii 

PART I 

STATISTICS IN SOCIAL RESEARCH 
CHAPTER I 

Introductory. 3 

CHAPTER II 

The Quantification of Social Data.10 

CHAPTER III 

Factor Control.24 

CHAPTER IV 

The Statistical Inquiry.31 

PART II 

STATISTICAL METHODS 
CHAPTER V 

Tabulation of Frequency Distributions.59 

CHAPTER VI 

Graphs.76 

CHAPTER VII 

Averages and Rates.94 

CHAPTER VIII 

Measures of Deviation and Partition.122 

CHAPTER IX 

COBiBlNATlON, PROBABILITY, AND THE NORMAL DISTRIBUTION . . « . 143 

ix 












X 


CONTENTS 


CHAPTER X Page 

Gross Relationship between Two Factors: Simple Linear Quan¬ 
titative Correlation.171 

CHAPTER XI 

Gross Relationship between Two Factors: Nonquantitative 
Correlation.197 

CHAPTER XII 

Sampling AND Sampling Errors. 221 

CHAPTER XIII 

The Significance of Differences.255 

CHAPTER XIV 

Time Series Analysis.276 

Appendix.*.299 

Index.343 










PART I 


Statistics in Social Research 




CHAPTER I 


INTRODUCTORY 

1. The Origins of Statistics.—The word statistics was used in 
Great Britain and Statistik in Germany as early as the eighteenth 
century to refer to collections of information of any kind about a 
state (s^a^e-istics). As time passed, ‘‘statistics'' came to be 
limited to quantitative data or figures on wealth, taxes, marriages, 
baptisms, deaths, and the like. Distinguished pioneers in the 
field were the Germans, Achenwall and Busching. Modern 
agencies representing this type of statistics are the census bureaus 
of the United States and other nations. 

Mathematical statistics, a branch of mathematical theory, 
originated in investigations of birth and death rates and in 
efforts to solve problems growing out of games of chance. Among 
the great early vital statisticians were Graunt and Petty of 
England and Siissmilch of Germany. The fundamentals of 
the theoiy of probability were developed from the seventeenth 
to the nineteenth century by such eminent mathematicians as 
Pascal, Bernoulli, de Moivre, Laplace, and Gauss. 

Elementary mathematical statistics was popularized in the 
nineteenth century by the Belgian, Quetelet, who applied it to a 
wide variety of topics, including physical anthropology and 
crime. He is sometimes called the father of social statistics as 
the extension of statistics to sociological problems may be 
termed. 

A rapid expansion in mathematical statistics and its use in 
science occurred in England during the first quarter of the 
present century through the work of Karl Pearson, following 
earlier efforts by Sir Francis Galton. These two men were 
biologists, and Pearson was a mathematician as well. As a 
result of this phase, modern statistical methods bear the imprint 
of adaptation to biological data. 

Mathematical statistics has gradually become a major method 
of research in the fields of agriculture, biology, educational 

8 



4 


ELEMENTARY SOCIAL STATISTICS 


psychology, psychology, geography, and physical anthropology. 
Among the social sciences, education, economics, and social 
psychology at present lead in the proportion of statistical studies 
published, with sociology fourth and political science fifth. 
Statistical analysis is still rare in cultural anthropology and 
history. In a different direction, statistics is finding application 
in mathematical physics, engineering, and medicine. 

In this book, we shall be interested in elementary statistical 
methods only as tools of investigation in sociology and related 
social sciences. 

2. Quality and Quantity.—qualitative difference implies a 
difference in nature, such as we recognize between a family and a 
church. A quantitative difference refers to a variation in 
amount between two or more instances of the same quality: for 
example, an intelligence quotient (I.Q.) of 112 is 14 units greater 
than an intelligence quotient of 98. 

Different qualities must be compared in terms of common 
subqualities (common denominators), as a Presbyterian family 
and a Methodist church, a family of five members and a church 
of 500 members. A pure quality, moreover, can vary only in 
amount. It follows that all comparison must consist in noting 
what qualities are and are not common to A and B, and how 
each common quality varies in amount from A to B. The city 
and the country may be compared in terms of common qualities 
like, population density, birth rate, death rate, incidence of 
tuberculosis, intelligence quotients, honesty, and so on; but in 
each instance the difference must be in terms of amount. Thus 
the birth rate of the city is 15 per 1,000, that of the country is 
22 per 1,000; and country people are believed to be more honest 
than city people. The last judgment is no less quantitative in 
nature because it is impressionistic and rough. 

Because comparison is basic to knowledge, and quantitative 
judgments are inseparable from comparison, quantitative judg¬ 
ments are unavoidable in science. It is thus easy to understand 
why scientists have gradually developed more and more system¬ 
atic and reliable ways of making quantitative judgments, such 
as we have in the many branches of mathematics, including 
mathematical statistics. 

3. Statistics, the Method of Probabilities.—The questions in 
which social scientists are interested do not have exact or certain 



INTRODUCTORY 


5 


quantitative answers. For example, if we ask what is the 
relation between the occurrence of divorce and the presence of 
children in the home, we find that divorce occurs both among 
couples with children and among couples. without children, 
but relatively more often in the case of the latter. We cannot 
say that divorce takes place only when there are no children; but 
we can say that divorce is reported in so many childless marriages 
per 1,000, and in so many fertile marriages per 1,000. Or, 
expressing it a little differently, we can say that the chances of 
divorce are X in 1,000 in the case of a childless couple, and Y 
in 1,000 in the case of a fertile couple. 

Statistical methods are specially designed for the analysis 
of quantitative^ data like those above that result from many 
causes, some or all of which cannot be completely controlled. 
Outside the scientific laboratory, and even in much laboratory 
research, adequate control over all factors is out of the question. 
For this reason, the statistical method has general application. 

Mathematical statistics is a direct logical extension to practical 
situations of the exact quantitative methods used in the labora- 
tory experiments of the physical sciences. When precise 
measurement and complete control over all factors are possible, 
a mathematical equation can be set up from which the value of 
a dependent factor, Y, can be estimated exactly for any given 
value of an independent factor, X. For example, if we know 
the distance, Xj of an object from the ground, we can calculate 
from the law of falling bodies the time, F, it will take for the 
object to fall in a vacuum. When we actually drop an object 
under these controlled conditions, a stop watch will always 
register the length of time predicted by the equation. The 
likelihood that the period observed in any competent repetition 
of the experiment will be that computed from the equation is 
certainty. 

If, however, the object happens to be a feather which is 
dropped under ordinaiy atmospheric conditions rather than 
in a vacuum, the situation is different. In proportion as the 
factors are uncontrolled or unknown, the stop watch will no 
longer register the time predicted by the law of falling bodies. 
Nevertheless, if a large number of experiments are made by 

^ By quantitative data are meant data that can be measured or counted, 
as discussed below in Chap. II. 



6 


ELEMENTARY SOCIAL STATISTICS 


dropping the'feather under ordinary atmospheric conditions 
from the same distance, the time of falling will be found to vary 
around some average time, being sometimes more and sometimes 
less. Similarly, the average time of falling from other distances, 
X, can be found, and an equation worked out from which the 
average time of falling, F, can be estimated for any distance, X. 
Then by studying the varying time required for the object to 
fall a given distance, it may be established that in say two-thirds 
of the trials the time does not vary from the average time, say 
2 sec., by more than say 0.1 sec. This enables us to make a 
prediction from our empirical equation. We can say that if our 
feather is dropped under ordinary atmospheric conditions from 
a given height, the time required to fall will, two out of three 
times, in the long run, vary from an estimated average of 2 sec. 
by not more than 0.1 sec. in either direction. That is, in two 
out of three trials, the time of falling will be between 1.9 and 
2.1 sec. 

This is, broadly speaking, the kind of estimate that mathe¬ 
matical statistics furnishes in the social sciences. In essence it is 
always a calculation of probabilities. The ‘‘pure^^ mathematical 
formula of the laboratory is merely a special case of the statistical 
equation, being the limit that the latter approaches as the amount 
of control and precision of measurement are increased. If 
sociological data could be exactly controlled and measured, the 
element of probability would disappear, and the statistical 
equation would become a precise one like the law of falling 
bodies. 

4, Representative Data. —Most sociological studies, statistical 
or otherwise, deal with samples rather than with complete data. 
If farm life in a given state is to be investigated, certain farms 
are taken as a sample to represent all the farms in the state 
regarded as the universe. The essential requirements of a good 
sample are that every item in the universe from which the sample 
is drawn shall have an equal chance of being included in the 
sample, and the sample must be large enough to include every 
kind of item in the universe in something like the correct propor¬ 
tions. The proper size of the sample depends somewhat on how 
much the items in the universe vary among themselves. Poor 
samples that include items from outside the universe they are 
intended to represent, that omit important elements of the 



INTRODUCTORY 


7 


universe, or that include elements of the universe in the wrong 
proportions, are a fertile source of false conclusions in social 
research. A large part of mathematical statistics deals with 
the problems of sampling. 

6. Statistics and the Individual.—It is commonly thought that 
statistics cannot deal with the individual, but must confine itself 
to group averages. There is really nothing to prevent a statis¬ 
tical investigation of an individual. An individual may be 
readily analyzed into factors or units of various kinds, and the 
relationships of these to other factors in the same personality 
and in the environment can be studied by the same methods 
that are now used in studying groups of individuals. As a 
matter of economy, however, society will seldom want to subject 
individuals to scientific study except as types, which, of course, 
lead back to group averages. 

6. Interpretation of Statistical Results.—Statistics employs 
figures and mathematical symbols that represent definite factors 
in a particular problem. In interpreting statistical results, 
therefore, care must be taken that each symbol is given the 
same meaning that was assigned to it at the beginning of the 
problem, and to wliich no important exceptions were allowed 
during the study. 

It is sometimes puzzling to understand the reasons for a 
statistical fact, and offhand explanations may be found at the 
end of even careful studies. But if the original study was not 
sufficiently inclusive to clarify some point of interest, its reliable 
explanation can consistently come only from further research. 
For example, if an investigation discovers that a larger proportion 
of women are married in cities where the number of men exceeds 
the number of women than in cities where the two sexes are 
equal in number, or where women outnumber men, one may 
speculate that this is because men do the proposing. It should be 
made clear, however, that such an explanation is only a plausible 

hunch,” which should be tested if it is considered of enough 
importance. 

Difficulty may also be experienced in interpreting just what 
certain statistical concepts mean, e,g,j correlation coefficients, 
averages, or tests of statistical significance. The only help 
here is a clearer understanding of statistical methods, and 
especially of the mathematical assumptions that underlie them. 



8 


ELEMENTARY SOCIAL STATISTICS 


7. Statistics Not a Mechanical Method. —Although the sta¬ 
tistical method allows data to be treated by systematic and 
standardized techniques, it is a serious mistake to suppose that 
it is a mechanical method that may be substituted for hard and 
original thinking. On the contrary, mathematical statistics is 
merely a set of powerful logical* tools that call for a high type of 
judgment and skill for their successful use. The statistical 
investigator must know what techniques are valid and effective 
for a given problem, and when quantitative methods are not 
appropriate at all. He needs insight to select worthwhile 
problems, and intimate knowledge of the data to interpret his 
findings, no less than does any other type of investigator. 

8. Simplicity the Ideal.—The experienced statistician always 
prefers simple to complex methods, when the two are equally 
effective. The beginner will do well not to yield to the tempta¬ 
tion to depart from this sensible rule. 

Exercises 

1. Briefly summarize the history of statistics and the extent of its 
use as a method of research. 

2. Distinguish between quality and quantity. Illustrate. 

3. Can you find an exception to the proposition that all comparison 
is quantitative? 

4. To what general kind of research situation is statistics appropriate, 
and why? Illustrate. 

6. What is the relationship of the statistical equation to the mathe¬ 
matical ‘‘law” of physics? 

6. a. How exactly can predictions be made by means of statistical 
methods? 

6. How serious a handicap does this impose on social research? 

7. What is the likelihood in the field of social research that the statis¬ 
tical method will some day be replaced by exact mathematical formulas 
like those of physics? Explain. 

8. Comment briefly on the following published statement: 

Jobless Survey to Bare Truth, C. C, Head Says: Pres. George Davis 
of the Chamber of Commerce of the United States said Saturday an 
impartial survey of the employable jobless would show their numbers 
had been exaggerated and disprove alleged needs for spreading work 
by reducing working hours. 

“He said the chamber recently employed a statistical agency to 
make a sample survey of 100 relief recipients in a representative city 
of more than 100,000 population. The names of 50 men and 50 women 



INTRODUCTORY 


9 


were picked at random from Federal and local governmental relief rolls 
in the city. 

‘‘The survey showed, he said, that 44 out of the 100 never had been 
employed in private business. Seventeen were over 70 years and 82 
never had a hank or savings account, 

“He says the figures point out that the greater number of those 
labeled as unemployed could not or would not work in private industry 
even if jobs were available.” 

9. Give an example of representative and unrepresentative, adequate 
and inadequate sampling that might occur, or has occurred, in social 
research. 


References 

Bernard, L. L., ed.: The Fields and Methods of Sociology^ Farrar & Rinehart, 
Inc., New York, 1934. 

Dampier-Whetham, W. C. D.: .4 History of Science, Chap. X, The Mac¬ 
millan Company, New York, 1930. 

Giddings, F. H.; Studies in the Theory of Human Society, Chap. VI, The 
Macmillan Company, New York, 1926. 

Lundberg, G. a.: Foundations of Sociology, The Macmillan Company, New 
York, 1939. 

Mills, F. C.: On Measurement in Economics, in the Trend of Economics, 
Rexford Tugwell, ed., Alfred A. Knopf, Inc., New York, 1924. 

Ogburn, W. F. : Sociology and Statistics, Chap. XXX, in The Social Sciences, 
W. F. Ogburn, and Alexander Goldenweiser, eds., Houghton Mifflin 
Company, Boston, 1927. 

Rice, Stuart A., ed.: Methods in Social Science: A Case Book, University of 
Chicago Press, Chicago, 1931. 

Smith, James G.: Elementary Statistics, Chaps. XIX and XXV, Henry 
Holt and Company, Inc., New York, 1934. 

Walker, Helen M.: Studies in the History of Statistical Method, The 
Williams & Wilkins Company, Baltimore, 1931. 



CHAPTER II 


THE QUANTIFICATION OF SOCIAL DATA 

1. Definition and Counting.—The methods of statistics are 
applicable only to data that can be expressed in some kind of 
countable units. Any event or quality that can be recognized 
can be counted. If we know a happy marriage when we see one, 
we can count the happy marriages in a sample of marriages. 
Nothing simpler can be done to a concept than to count how 
many times instances of it occur. If the concept is not suffi¬ 
ciently recognizable for its instances to be counted, one may 
fairly assume that it is not yet ready for any kind of scientific 
manipulation, except attempts to arrive at a more reliable 
definition. 

2. Classification.—If a concept (c.g., ‘^conflict behavior'^) can 
be broken down into two or more subcategories (e.gr., ‘‘war,^' 
‘^revolution,’' etc.) that can be defined well enough to be told 
apart, its cases can be classified. Classification makes possible 
the counting of instances in each class, which may then serve as a 
basis for considerable statistical analysis. We have simple 
classification whenever data are sorted into categories that are 
entirely unordered with respect to amount. For example, we 
may classify our acquaintances as religious and nonreligious; 
we may classify Americans as native white of native parentage, 
native white of foreign parentage, foreign born, and so on. 
Data may also be classified with respect to two or more criteria 
at a time, as married couples by occupation of husband, by 
income, and by number of children. The points to watch in 
classification are careful, objective definition of the several 
categories in terms of criteria that can be recognized in the 
instances to be classified, and independent reclassification of 
the instances by other competent investigators, to determine 
the reliability of the classification. Logically, any classification 
should be based on the same criterion throughout. Thus, 
it would not do to classify some of the foreign born as Catholics 

10 



THE QUANTIFICATION OF SOCIAL DATA 


11 


or Protestants, and the rest as Italians, Jews, Germans, and so on. 
Also, any classification should be totally inclusive of the class 
defined and exclusive of all other classes.^ That is, if we are 
dealing with all the foreign born in the United States, the 
Rumanians should not be omitted, nor should the American 
Indians be included. 

3. Measurement of Amount. —The fact that any quality, such 
as happiness in marriage, varies in degree, sooner or later forces 
the sociologist to go beyond the mere counting of instances, and 
to attempt to measure the intensity of the quality in a given 
instance or set of instances. For example, we may score the 
answers of married couples to a questionnaire and may regard 
the score of any couple as an index of the amount of happiness 
that they derive from their relationship. The central problem 
is, again, to find a unit in terms of which at least the relative 
amount of the quality can be measured. This is seldom easy to 
do, and must usually be approached through the devices of 
ranking, rating, or scoring. 

4. Ranking. —Ranking, or the arrangement of the instances 
of a quality in order of amount, has been called the most ele¬ 
mentary form of measurement. We consider person A more 
cooperative than person R, B more cooperative than C, and so on. 
To increase the reliability of these judgments, the ranking may 
be done independently by several qualifi(id judges, and the 
average ranks taken. Greater accuracy is sometimes obtained 
by ranking each item with respect to every other item, i.e., by 
all possible pairs. Where qualified and careful judges cannot be 
obtained, ranking should not be used. As soon as the instances 
of a quality are ranked, they become capable of a fair amount of 
statistical treatment, including rank correlation. ^ 

6. Rating. —Similar to ranking is rating, or the classification 
of items into ascending, or ordered, classes. There are usually 
three to seven of these classes. An odd number allows for a 
median class, which is desirable. Thus psychiatrists may rate 
persons in terms of their intelligence as Mentally Defective, 
Slow-dull, Slow, Average, Fairly Intelligent, Distinctly Capable, 

^ See **classification” in any text in logic, e.gr., E. A. Burtt, Principles and 
Problems of Right Thinking^ pp. 162-164, Harper & Brothers, New York, 
1928. 

* See Chap. X, Sec. 8. 



12 


ELEMENTARY SOCIAL STATISTICS 


and Very A6le. Classification of instances into categories like 
these should be done independently by two or more persons as a 
check. If there is good agreement in the placing of individual 
instances, the percentages of the instances put in each given 
category by several judges may then be averaged to improve the 
accuracy. Self-ratings may be used, as well as ratings by others. 

6. Scoring. —In the case of most score cards, the experimenter 
decides impressionistically what subscore, usually a percentage, 
should be given to each aspect of a variable (c.g., the socio¬ 
economic status of a home). In other cases, a subscore is 
determined by counting the number of a certain item present 
in each instance {e,g.^ books in a home), or by measurement in 
the stricter sense {e,g,, annual family income in dollars). The 
total score is the sum of the subscores on the different items 
included in the card. Usually the equality of the units, the 
placing of the zero point, the weightings, and the meaning of the 
total score are open to question; but in any case the total score 
represents a series of accumulated judgments reduced to a 
numerical common denominator. Scoring devices may be quite 
elaborate, as may be seen by inspecting Chapin's living room 

scale" for scoring the socioeconomic status of a home, the 
Stanford-Binet intelligence test, or score cards for, say, dairy 
cattle used in judging contests at livestock shows. To show 
that they are parts of the same or associated things, the score 
on each item included on a score card should as a rule be high 
when the total score on the card is high, low when the latter is 
low. The theory of the score card is that the total score is an 
index or function of (varies with) the amount of the quality 
it is attempting to measure. Part of a living room score card 
designed by F. Stuart Chapin to measure the socioeconomic 
status of American homes is reproduced below. 

Chapin's Scale for Rating Living Room Equipment^ 

DIRECTIONS TO VISITOR 

1. The following list of items is for the guidance of the recorder. Not 
all of the features listed will be found in any one home. Entries on the 
schedules should, however, follow the order and numbering indicated. 

^ F. Stuart Chapin, Scale for Rating Living Room Equipment, American 
J<mrnal of Sociology, Vol. 37, pp. 683, 584, 1932. 



THE QUANTIFICATION OF SOCIAL DATA 


13 


Weights appear after the names of the respective items. Disregard 
these weights in recording. Only when the list is finally checked 
should the individual items be multiplied by these weights and the 
sum of the weighted score be computed, and then only after leaving the 
home. All information is confidential. 

2. Check or underline the articles or items present. If more than 
one, write 2, 3, or 4, as the case may be. 

3. Do not enter the score of any article or feature present. Com¬ 
plete recording before attempting to enter scores. 

4. In cases where the family has no real living room, but uses the 
room at niglits as a bedroom, or during the day as a kitchen or as a 
dining room, or as both, in addition to use of room as the chief gathering 
place of the family^ please note this fact clearly and describe for what 
purposes the room is used, 

5. When possible, it is desirable to have a living room checked twice. 
This may be done in either of two ways. 

a. After an interval of two or three weeks the same visitor may 
recheck the room. The first schedule should be marked I, the 
second II. 

6. After an interval or simultaneously, the room may be checked by 
two different visitors. One schedule should be marked A, the 
other B. 

Scores of the same homes on two trials should be similar. If a group 
of homes are scored twice there should be a high correlation between 
the scores. Please report findings to F. Stuart Chapin, University of 
Minnesota. 


Schedule 
I. Fixed Features 

1. Floor. 

Softwood 1, hard¬ 
wood 2, composi¬ 
tion 3, stone 4. 

2. Floor covering. 

Composition 1, car¬ 
pet 2, small rugs 3, 
large rug 4, Orien¬ 
tal rug 6. 

3. Wall covering. 

Paper 1, calcimine 
2, ^lain paint 3, 
decorative paint 4, 
wooden panels 5. 

^ If checked out of season. 


OF Living Room Equipment 

4. Woodwork. 

- Painted 1, var¬ 
nished 2, stained 3, 
oiled 4. 

5. Door protection... . 

- Screen 1, storm 

door 1. 

6. Windows 

1 each window. 

_ 7. Window protection^ . 

Screen, blind, net¬ 
ting, storm sash, 
awning, shutter, 1 
each. 

ascertain if used in season and so record. 







14 


ELEMENTARY SOCIAL STATISTICS 


8. Window covering^. - 

— III. Standard Furniture 

Shades 1, curtains 

20. 

Table. 

2, drapes 3. 


Sewing 1, writing 1, 

9. Fireplace. 

_ 

card 1, library, end. 

Imitation 1, gas 2, 


tea, 2 each. 

wood 4, coal 4. 

21. 

Chair. 

10. Fire utensils. 

- ^ 

Straight, rocker. 

Andirons, screen, 


arm-chair, high 

poker, tongs, shov- 


chair, 1 each. 

el, brush, hod, has- 

22. 

Stool or bench.... 

ket, rack, 1 each. 


High stool, foot- 

11. Heat. 

Stove 1, hot air 2, 

— 

stool, piano stool. 

steam 3, hot water 


piano bench, 1 

4 


each. 

12. Artificial light. 

_ 23. 

Couch. 

Kerosene 1, gas 2, 


Cot 1, sanitary 

electric 3. 


couch 2, chaise 

13. Artificial ventila¬ 


longue 3, daybed 4, 

tors 1. 

_ 

davenport 5, bed- 

14. Clothes closets 1. . - 

_ 

davenport 6. 

Total section I.... _ 

_ 24. 

Desk. 

Built-in Features 


Business 1, per¬ 

15. Book containers... _ 

— 

sonal-social 2. 

Shelves 1, cases 2. 

25. 

Bookcases 1. 

16. Beds. 

26. 

Wardrobe or mov¬ 

In a sideboard 1, in 


able cabinet 1. 

a ceiling 2, in a 
door 3. 

27. 

Sewing cabinet 1.. 

17. Desk 1. 

_ 28. 

Sewing machine... 

18. Window seats 1... - 


Hand power 1, foot 

19. Window boxes 1... _ 


power 2, electric 3. 

Total section II... - 

_ Etc., etc. 


1. The Scale.—The ideal measuring device is the scale. By 
a scale is meant a sequence of interchangeable external units 
numbered from zero, such as a straightedge marked off into feet 
and inches. In sociology and psychology most attempts to 
develop scales have started from ranks or ratings. One of the 
simplest devices is the so-called graphic rating scale. The 
following is an example. 

0 25 50 75 100 

■ » « I - I 

Completely Submissive Average Dominating Completely 
submissive dominating 

Fia. 1.—A simple graphio rating scale, 
checked out of season, ascertain if used in season and so record. 

















THE QUANTIFICATION OF SOCIAL DATA 


15 


Each judge rates each subject on a separate scale by making a 
mark on the scale where he thinks the subject falls. The 
distance of the mark from ‘^Completely submissive’^ taken as 
zero is then measured in units of the spatial scale. The final 
rating of each subject is the average of the ratings given him 
by the several judges, provided there is a tendency toward 
agreement among them. The scale may become more objective, 
however, if a subject is scored, say, “80 per cent dominating” 
because he is observed to dominate (as tangibly defined) in 
80 per cent of his contacts. This assumes that one contact is 
equal to another for the purpose in hand; but weighting may be 
applied if needed. Evidently, this kind of scale cannot claim 
the precision of scales in the physical sciences; but it is capable 
of very useful results. 

If the ordinal^ numbers derived from ranking are subjected to 
arithmetical treatment, such as addition or the calculation of 
means, it is implicitly assumed that the ranked instances are 
equally spaced on a linear scale. Thus, if we rank cities in 
respect to the efficiency of their governments, beginning with 
the least efficient, so that city C is 1, city A is 2, city B is 3, 
etc., and if we then use these ordinals as cardinals in arithmetical 
calculations, we imply that the government of city A is twice as 
efficient as that of city C, that the government of city is 1.5 
times as efficient as city A, etc. This assumption is, of course, 
inaccurate, but sometimes it is the best that can be done, or it is 
good enough for a particular problem. The zero point on such a 
scale is arbitrarily placed, usually coincident with or one unit 
below the lowest rank. 

The most elaborate effort to build an exact scale yet made 
in the social sciences is probably that of L. L. Thurstone in the 
case of his scale for the measurement of an attitude, a sample of 
which is reproduced below.^ Generalizing on Thurstone’s 
method, and introducing minor modifications, it runs about as 
follows. A considerable number of supposed indexes of the 
attribute to be measured are chosen. Let us say that the 
attribute is “radicalism”; then the indexes might include mem¬ 
bership in the Socialist party, admitted statements made against 

cardinal number tells how many or how much; an ordinal number 
locates position in a series. 

* L. L. Thurstone and E. J. Chavb, The Measurement of Attitude, Univer¬ 
sity of Chicago Press, Chicago, 1929. 



16 


ELEMENTARY SOCIAL STATISTICS 


the existing social order, membership in a labor union, radical 
papers and journals read, expressed Communistic sympathies, 
signature on radical petitions, participation in strikes, jail 
sentences for radical activity, the authorship of radical articles 
and books, subscriptions to this or that radical doctrine, atheism, 
unconventional sexual behavior, and so on. After these indexes 
have been selected and defined as objectively as possible, they 
are submitted to a number of qualified judges, who are asked to 
rank them in the order of the degree of radicalism that each 
seems to imply. Indexes that appear to indicate about the same 
degree of radicalism are regarded as ties. The indexes are thus 
collected into successive piles, which to the judges should seem 
to be equally spaced apart in degree of radicalism. When the 
judges have finished ranking the indexes, each index is assigned 
the average rank given it by the several judges, except that any 
index about which the judges differ too much is rejected entirely. 

Index C flndexA JndcxB 

\ y Y 

U..],I I„l I I _t • -» I_ I > I t ■ ...l 

0 3 5 1011 15 20 80 85 90 92 95 100 

Fio. 2.—Diagram of a generalized Thurstone attitude scale. 

Each index will then have an average rank or scale value, and 
these values may if desired be converted to a percentage scale, 
from the lowest value taken as zero to the highest value taken 
as 100 (see Fig. 2). The scale is then ready to be applied to 
other samples of instances (say persons), by simply checking 
on a list of the indexes those that apply to a given individual, 
adding the scale values of the indexes checked, and averaging 
them. Each individual may thus be given a scale value that 
is supposed to measure in a relative way the amount, say, of 
‘‘radicalism” that characterizes him. 

Thurstone^s attitude scale has often given results that cor¬ 
related highly with those obtained by much simpler procedures, 
such as graphic rating scales, and ratings^ or rankings represented 
by consecutive numbers. It has also been criticized on various 
theoretical grounds.* 

^ For example, individuals are classified as Very Radical, Radical, Neutral, 
Conservative, Very Conservative, and those in the Very Radical group are 
given a score of one, those in the Radical group a score of two, etc. 

* See R. K. Mebton, Fact and Factitiousness in Ethnic Opinionnaires, 
American Sociological Review, Vol. 5, pp. 13-28, 1040. 



THE QUANTIFICATION OF SOCIAL DATA 
Sample op a Thurstone Attitude Scale ^ 

EXPERIMENTAL STUDY OF ATTITUDE TOWARD THE CHURCH 


17 


Check () every statement below that expresses your sentiment toward 
the church. Interpret the statements in accordance with your own experi¬ 
ence with churches. 

Scale 

value 

1. I think the teaching of the church is altogether too superficial 

to have much social significance. 8.3 

2. I feel the church services give me inspiration and help me to 

live up to my best during the following week. 1.7 

3. I think the church keeps business and politics up to a higher 

standard than they would otherwise tend to maintain. 2.6 

4. I find the services of the church both restful and inspiring... 2.3 

5. When I go to church I enjoy a fine ritual service and good 

music. 4.0 

6. I believe in what the church teaches but with material 

reservation. 4.5 

7. I do not receive any benefit from attending church services 

but I think it helps some people. 5.7 

8. I believe in religion but I seldom go to church. 5.4 

9. I am careless about religion and church relation.sliips but I 

would not like to see my attitude become general. 4.7 

10. I regard the church as a static, crystallized institution and as 

such it is unwholesome and detrimental to society and the 
individual. 10.5 

11. I believe church membership is almost essential to living life 

at its best. 1.5 

12. I do not understand the dogmas or creeds of the church but I 

find that the church helps me to be more honest and 
creditable. 3.1 

13. The paternal and benevolent attitude of the church is quite 

distasteful to me. 8.2 

14. I feel that church attendance is a fair index of the nation’s 

morality. 2.6 

15. Sometimes I feel that the church and religion are necessary 

and sometimes I doubt it. 5.6 

16. I believe the church is fundamentally sound but some of its 

adherents have given it a bad name. 3.9 

17. I think the church is a parasite on society. 11.00 

^L. L. Thurstone and E. J. Chavb, The Measurement of Attitude j p, 61, 
University of Chicago Press, Chicago, 1929. 




















18 ELEMENTARY SOCIAL STATISTICS 

There are also several important methods of converting ranks 
to a scale having more or less equal units and an arbitrary zero 
point that are outside the scope of this text.^ Probably the 
most scientific is the mathematical method of curve fitting.* 

When the concepts of space, time, money, weight, mass, and 
so on, are used in sociology, they are of course amenable to 
accurate measurement by scales already scientifically established. 

8 . Discrete Aggregates.—Population aggregates are of great 
importance in sociological studies. It is possible to define these 
aggregates (communities, neighborhoods, families, and the like) 
so that their number can be counted. We hold that it is also 
possible to measure the size of such aggregates by counting the 
number of individuals that compose them. We do this in the 
belief that the only essentials of measurement are units that 
are equal and interchangeable for a purpose. The sociologist 
finds it more useful for his purposes to measure the size of the 
family in terms of the number of its members than in terms of 
their weight in pounds or their height in inches. The nature 
of a ‘‘member'^ does not vary from person to person in any way 
that interferes with the purpose. Moreover, since there is no 
point in subdividing a ^'member,'' nothing is lost because it is 
logically a discrete unit. This idea of measurement can also be 
extended to any other sociological concept that can be broken 
down into parts that are equal and interchangeable for the 
purpose in hand. 

9. The Measurement of an Intangible Quality.—All attempts 
to measure an intangible quality, such as an attitude, must, of 
course, be indirect in type. The classic example of indirect 
measurement in the physical sciences is a thermometer that uses 
the changing length of a column of mercury as an index of change 
in the amount of the intangible quality ‘‘temperature/' In 
the case of the indirect measurement of a quality Y (temperature) 
in terms of an index X (mercury column), there should ideally 

' See J. P. Guilford, Psychometric Methods^ McGraw-Hill Book Company, 
Inc., New York, 1936; P. M. Symonds, Diagnosing Personality and Conduct, 
pp. 86-89, D. Appleton-Century Company, Inc., New York, 1931. 

* Karl J. Holzinger, Statistical Methods for Students in Education, pp. 
221-224, Ginn and Company, Boston, 1928; C. H. Richardson, An Intro^ 
duction to Statistical Analysis, Chaps. VIII and X, Harcourt, Brace and 
Company, Inc., New York, 1934, 



THE QUANTIFICATION OF SOCIAL DATA 


19 


be a perfect straight-line relationship between the two (see Chap. 
X), so that each unit change in X represents a constant amount 
of change in Y. But since Y is an intangible and cannot be 
directly measured, there is no way of proving that such a relation¬ 
ship exists between Y and X. So, while we may be certain that 
a scale distance of say 4X is twice as great as a scale distance of 
2X, we cannot be certain that a scale distance of 4X represents 
twice as much of Y as does a scale distance of 2X. A child 
with an I.Q. of 120 is probably not just twice as intelligent as 
another child with an I.Q. of 60. All devices of indirect measure¬ 
ment, including the thermometer, are open to this objection. 
But the scientific and practical usefulness of the thermometer 
and of other indirect measuring devices suggests that for many 
purposes this is not serious. Usually the important tilings are 
rather that the same absolute reading on the X scale shall always 
represent the same amount of the intangible quality F, as 
verified by introspection or by some external result in which we 
are interested (e.g., at 32°F., water freezes); and that the X scale 
shall be able to differentiate changes in Y small enough for our 
purposes. We shall then know what to expect from Y when 
the scale registers a certain value of X. If the relationship is 
close enough to permit a useful prediction of Y from the reading 
on the X scale, the latter may still be valuable and is not to be 
discarded until a better index is found. 

In practical scale or score-card making, where there is an 
attempt to measure an intangible quality Y in terms of a tangible 
index X, it is often helpful to set up a ‘‘fundamental intervar' for 
subdivision. This is done by selecting two extreme observable 
instances of F, marking the values of X corresponding to them 
“O'' and, say, “100'' respectively, and dividing the included 
range of X into 100 equal units. In the case of one thermometer, 
the extreme instances of temperature are taken at the melting 
point of ice and at the condensing point of steam. As a parallel, 
in mental testing, for certain purposes we might regard inability 
to pass the first grade in school as indicative of zero intelligence, 
and ability to finish the university with honors as indicative of 
100 per cent intelligence, and represent intermediate degrees of 
intelligence by scores between 0 and 100. As with most ther¬ 
mometers, for many purposes the zero point need not denote an 
absolute zero, and the upper limit need not mean the ultimate 



20 


ELEMENTARY SOCIAL STATISTICS 


maximum amount of Y. It is important, however, to make sure 
that the ^‘fundamental interval’’ includes as large a range of 
data as investigators will require. 

When the quality Y is subjective (e.gf., happiness in marriage), 
it has already been implied that there are two ways of testing the 
amount of relationship between it and the tangible index X 
(e.g.f a score on the Burgess-CottrelP scale for measuring happi¬ 
ness in marriage): (1) by comparing the amount of Y indicated 
by the X instrument with the subjective judgment of the subject 
or of a competent observer couples getting high scores 

on the Burgess-Cottrell scale consider themselves happy)—this 
is appropriate if interest centers in the subjective quality as such; 
and (2) by checking the readings of the X instrument against 
certain tangible conditions that are ascribed to Y {e.g,, low 
happiness scores on the Burgess-Cottrell scale are followed by 
divorce more often than are high scores). These are called 
tests of validity. Validity is also established in part by defini¬ 
tion and agreement, e.g.^ the cooperative definition described in 
Chap. IV. Chapin’s living room scale, mentioned above, is 
intended to measure the socioeconomic status of the homes to 
which it is applied. The fact that the card has given higher 
scores when applied to upper middle class homes than when 
applied to middle class homes, determined independently, is 
evidence of its validity. Its reliability was established when 
different observers used it on the same homes with little variation 
in results. 

Evidently, the indirect measurement of a subjective quality 
must wait upon the discovery of a satisfactory tangible index, 
which is to be sought among the apparent results or causes of the 
subjective quality, among the results of common causes, or, 
from a different point of view, among the external aspects of 
the subjective concept. Thus, the expansion and contraction 
of the column of mercury in a thermometer are apparently the 
result of changes in temperature. 

Whether the measurement of an intangible quality by means of 
a tangible index or by means of introspective ratings converted to 
scale values is superior depends upon particular circumstances, 
and especially upon the direction of interest. If possible, both 
should be carried through for purposes of validation. 

' E. W. Bubgess and Leonabd J. Cottbbll, Predicting Success or Failure 
in Marriage, Prentice-Hall Inc., New York, 1939. 



THE QUANTIFICATION OF SOCIAL DATA 


21 


10. Rules of Measurement.—We summarize below what 
are probably the most useful rules of measurement in social 
research. 

1. The quality that it is desired to measure should be defined 
verbally as clearly as possible in the beginning. But the meas¬ 
urement of a quality is also a crucial part of its definition. In 
fact, ‘'what the scale measures'' may later be regarded as pre¬ 
ferable to the verbal definition, as equivalent to it, or as not at 
all equivalent to it, depending on the degree of validity estab¬ 
lished for the scale and the usefulness of its results. 

2. The purpose of the measurement should be stated or 
understood. 

3. The unit used should be appropriate to the purpose of the 
measurement. 

4. Units should be equivalent one to another (equal, inter¬ 
changeable) for the purpose in view; except that in the indirect 
measurement of an intangible quality in terms of a tangible 
index the equality of the intangible units is indeterminate, and 
for many purposes is unimportant. 

If the units of a scale are sufficiently equal for a purpose, it is 
safe for that purpose to add or average them, to interchange 
them, or to claim that, say, two units represent twice as much 
of the quality as docs one unit. 

For the historian, one year is not equivalent to another; for 
the actuary constructing a life table, it is. 

5. The unit should be applied as exclusively as possible to 
the quality d(ffined for measurement, in accordance with the 
purpose stated. 

That is, in measuring a man's height in inches, we should not 
include his shoes, nor should we measure him in a slouched 
posture. So, in measuring “intelligence," we should, if possible, 
exclude inequalities of effort. 

6. The unit should be applied to the entire range in which 
the investigator is interested. 

In applying an inch end-over-end, or an inch scale, to measure 
the height of a man, no part of the total distance that is his 
height should be skipped or measured in other than a single 
straight line. When a Fahrenheit thermometer registers the 
temperature, however, it reads above or below a fixed point 
that is arbitrarily called zero. This is adequate for ordinary 



22 


ELEMENTARY SOCIAL STATISTICS 


purposes, because most of us are interested only in the range of 
temperature included in the thermometer, and not in an extension 
of that range to a depth never observed in ordinary experience. 
But for some scientific work, the temperature needs to be 
measured from a true zero point, and a different scale is used. 

The ratio of two measurements holds only with reference to 
the zero point from which they are made. If this is not an 
absolute zero, that fact should not be forgotten when interpreting 
the ratios. 

7. The size of the unit should be fine enough to detect the 
smallest differences that are of importance for the inquiry, but 
need be no finer. 

8. Final judgment of an instrument designed to measure an 
intangible quality should depend chiefly on tests of its validity 
and reliability. 

Summary.—We have seen that even ^'subjective” qualities 
are amenable to a great deal of statistical analysis through 
counting, classification, ranking, and rating. They cannot be 
exactly measured unless the form of their theoretical distribution 
is known a priori, or unless they are perfectly correlated with 
some objective index; and it is seldom or never possible to 
demonstrate completely either of these propositions. Neverthe¬ 
less, such qualities have already been measured in both the 
natural and the social sciences successfully enough to satisfy 
many important scientific and practical uses. Devices like 
the Binet test and like those used to score social attitudes, 
socioeconomic status, personality traits, and so on, are promising 
approaches to measurement in social research, and their rapid 
improvement and extension to cover many more sociological 
concepts are to be anticipated. Moreover, objective qualities 
in which sociology is interested not only can be counted, classified, 
and the like, but they can also either be measured by the scales 
already standardized by the physical sciences or they should 
offer no difficulties that are peculiar to the social sciences. 

Exercises 

1. Is anything more than clearness of definition necessary to render 
data amenable to statistical treatment? Illustrate. 

2. Can classification and counting alone form any basis for statistical 
analysis? Illustrate. 

8. What are the main points to watch in the use of classification? 
Illustrate. 



THE QUANTIFICATION OF SOCIAL DATA 


23 


4. How does classification differ from rating? Illustrate. 

6. Give an example of the kind and amount of ability a judge should 
have to qualify as a ** rater. 

6. Name at least one method of converting ranks to scale values. 

7. Devise a simple graphic rating scale for the personality trait of 
sociability.'' 

8. Describe some scoring device used in sociology. What is your 
opinion of it as a measuring instrument? 

9. Distinguish between a scoring device and a scale in the strict 
mathematical sense. 

10. Distinguish between counting and measurement. 

11. Illustrate a sociological problem where counting is equivalent 
to measurement. 

12. Discuss the possibility and necessity of equal units in the measure¬ 
ment of an intangible quality. 

13. What is of chief importance in the indirect measurement of an 
intangible quality? 

14. What is meant by the validity of a measuring scale? By its 
reliability? How can an instrument designed to measure an intangible 
quality be validated? Illustrate. 

16. Give an example of an intangible quality of interest to sociology, 
and describe briefly two ways in which it may be measured. 

16. What method of measurement would you apply to answer each 
of the following questions: 

a. Does divorce tend to increase with family income? 
h. Do the ablest people leave the farm for the city? 
c. How do 10 cities compare in respect to good government? 

17. What is the reason for taking a number of measurements of the 
same thing and averaging them? 

References 

Campbell, N. R.: Account of the Principles of Measurement and Calcula- 

tion^ Longmans, Green & Company, New York, 1928. 

Chapin, F. Stuart: Measurement in Sociology, The American Journal of 
Sociology^ Vol. 40, pp. 426-480, 1935. 

Croxton, F. E., and D. J. Cowden: Applied General Statisticsj Chaps. I and 
VH, Prentice-Hall, Inc., New York, 1939. 

Johnson, H. M.: Pseudo-mathematics in the Mental and Social Sciences, 
American Journal of Psychology ^ Vol. 48, pp. 342 ff., 1936. 
Kirkpatrick, Clifford: Assumptions and Methods in Attitude Measure¬ 
ment, American Sociological Review j Vol. 1, pp. 75 ff., 1936. 

Lundberg;' G. a.: The Measurement of Socio-economic Status, American 
Sociological Review^ Vol. 5, pp. 29 ff., 1940. 

Scates, Douglas: The Essential Conditions of Measurement, Psycho- 
metrikaj Vol. 2, pp. 27 ff., 1937. 

Tbrman, Lewis M., and Maud A. Merrill: Measuring Intelligence^ 
Houghton Mifflin Company, Boston, 1937. 



CHAPTER III 
FACTOR CONTROL 


Among the social sciences the controlled experiment has been 
employed much less than in the natural sciences. As a rule, 
sociologists have either preferred or felt obliged to investigate 
social situations in all their original complexity and confusion. 
The methods for dealing with this kind of data attempt to 
introduce control by means of classification in the case of attri-- 
butes (unmeasured traits, 6.g., married, single) and by mathe¬ 
matical devices in the case of variables^ (measured traits, e.g,, 
age in years). 

1. The Actuarial Method.—One of the most effective schemes 
of classifying attributes is similar in general principle to that 
employed by actuaries in determining insurance risks. ^ For 
example, a large number of paroled criminals may be sorted into 
relatively homogeneous groups with respect to various criteria, 
such as number of previous arrests, prison record, age, type of 
offense committed, intelligence, and so on, and the rate of 
violation of parole determined for each group. After proper 
testing, these rates may then be used as estimates of the proba¬ 
bility of violation of other prisoners who fall in the established 
classifications. 

We begin with a specified group of items, say paroled pris¬ 
oners from the Joliet (Ill.) penitentiary on Jan. 1, 1941. The 
simplest classification is a dichotomy^ or separation of the A's 
from the Not A's. Thus, our parolees may be divided into the 
married and the not married. If we wish to test whether 
marital status (trait A) is associated with success on parole 
(trait B), we compare the proportion of' successful parolees 
(JS's) among the married parolees (A's) with the proportion 
among the not married parolees {not A’s). When there is no 
association, t.e., the traits A and B are independent, the two 

^ For a more thorough development of this technique, see G. U. Yule and 
M. G. Kendall, An Iniroduction to the Theory of Statistical Chaps. I~V, 
Charles Griffin & Company, Ltd., London, 1937. 

24 



FACTOR CONTROL 


25 


proportions will be the same, except for chance errors. In 
other words, if 80 per cent of the married parolees succeeded, 
but only 60 per cent of the not married parolees did so, we would 
conclude that marital status was favorable to success on parole. 

Suppose we believe that a good prison record (trait C) also 
makes for success on parole. We test it in the same way as we 
did marital status above, and confirm our belief. It may then 
be worth while to make a double classification of the parolees 
by marital status and by prison record, as shown in Table 1. 
From this table we note that the proportion of successful parolees 
in the group as a whole is = 0.72, among the married is 
= 0.80, and among the married with a good prison record is 
yo = 0.93, approximately. On the other hand, among the 


Table 1. —Classification of 600 Parolees by Marital Status and 
Prison Record, Joliet, III., Jan. 1, 1941. (Hypothetical Data) 


Outcome 

Parolees 

1, married 

Parolees, 
not married 

.1 

Total 

Record 

good 

Record 
not good 

Record 

good 

Record 
not good 

Successful. 

65 

175 

25 

95 

360 

Not successful. 

5 

.55 

10 

70 

140 

Total. 

70 

1 


35 

165 

! 

500 



not married parolees with a not good prison record, the proportion 
of successes is = 0.58 nearly. Evidently, in future groups 
of parolees chosen in the same way and exposed to the same 
general conditions as were the 500 represented in Table 1, a 
married man with a good prison record may be expected to have 
a much better chance of succeeding than a man not married with 
a prison record that is not good. More specifically, for every 
man of the first type that failed, we should expect 6 of the second 
type to fail, out of equal numbers placed on parole. 

It is, of course, possible to subclassify the cases in Table 1 
still further, either by substituting more complete breakdowns 
for the dichotomies (e.^., married, single, divorced, widowed for 
married, not married), or by introducing additional factors 
{e.g.j employment record before arrest).^ 

^ See Chap. XI. 








26 


ELEMENTARY SOCIAL STATISTICS 


2 . The Search for Causes. —It is often said that the under¬ 
lying purpose of all science is prediction. Certainly, scientific 
research constantly seeks to discover causes. Much philo¬ 
sophical dispute has occurred regarding the nature and reality 
of a cause, but we shall here say only that we mean by a cause 
any factor whose change under controlled conditions is invariably 
followed or accompanied by a change in a second factor. The 
logicians refer to this as concomitant variation. The kind of 
causes with which practical science is most concerned are simply 
factors that give the easiest and most reliable prediction, or 
understanding, of certain conditions that constitute a problem. 
Thus, if we can always change the divorce rate in a given type of 
social situation by changing the proportion of Protestant- 
Catholic marriages, the intermarriage of Protestants and Catho¬ 
lics may be regarded as one cause of divorce under the given 
conditions. In the social sciences, there are always many 
causes that combine to produce any actual situation or result. 
Evidently the divorce rate of a city is the product of a vast 
number of forces, only some of which can be discovered or 
controlled. 

3. Matching Experimental and Control Groups. —The logical 
requirements for establishing a causal relationship are the same 
in every science.^ It is always necessary to establish the fact 
of concomitant variation. For working purposes, the procedure 
is essentially to introduce, remove, or vary in amount the 
suspected cause, and then to observe or measure the correspond¬ 
ing changes, if any, in the thing that is expected to be affected. 
For example, suppose that we want to test the belief that knowl¬ 
edge of the evils of alcohol will prevent young people from 
drinking. We expose a number of such persons to appropriate 
instruction and note what proportion of them acquire the habit 
of drinking within, say, a two-year period. In this group, called 
the experimental group, the supposed cause is present. A second 
group of young people, which may be termed the control group, 
is given no instruction, so that the supposed cause is absent. 
After two years, the proportion of habitual drinkers is deter¬ 
mined in this group also, and the proportions are compared 
between the experimental and control groups. If the experi- 

^ See John Dewey, Logic: The Theory of Inquiry, pp. 101, 462, 491, 609, 
Henry Holt and Company, Inc,, New York, 1938. 



FACTOR CONTROL 


27 


mental group shows a lower percentage of drinkers than the 
control group, however, it still cannot be said that the instruction 
made the difference, unless it can also be shown that nothing 
else is likely to have done so. Thus, it is possible that the 
experimental group contained a considerably larger proportion 
of women or of church members than the control group, which 
might make the comparison unfair. It is evidently necessary in 
any experiment that the experimental and control groups shall 
be essentially alike in all important respects that might affect 
the outcome, except for the factor or factors under investigation. 
This must, of course, be taken care of when the experiment or 
investigation is being planned. The young people in our 
experimental group must have no characteristics, except the 
instruction, that will make them more liable or less liable to 
become drinkers than those in the control group. The usual 
way of trying to insure this equality is to match the two groups 
in respect to every important point that may be related to 
drinking, such as age, sex, family background, church member¬ 
ship, present drinking habits and attitudes, and so on. Moreover, 
all conditions must remain approximately the same for the two 
groups during the two years that the experiment is under way. 

4. The Principle of Randomization.—In sociological research, 
however, it is seldom that an investigator can feel that his 
experimental and control groups are actually matched in all 
important respects needed to insure a valid comparison between 
them. He is, therefore, obliged to summon to his aid the princi¬ 
ple of randomization. Having matched his two groups as well as 
he reasonably can, he then decides by a random draw which 
of each pair of matched subjects, or which subjects from the 
total lot, shall belong to the experimental group and which to 
the control group. If tliis is not feasible, it may be decided by a 
draw which of the two matched groups shall be the experimental 
one, or this may be done in addition to the above. As long 
as there are only two groups, this latter method of randomization 
alone is not very effective. The experiment will be better 
designed if there can be several groups, or replications, half of 
which are drawn at random to serve as the experimental groups. 
In some cases, indeed, the whole process of matching the groups 
may best be omitted, and dependence placed in subdividing the 
potential events—c.p., a large number of unselected young 



28 


ELEMENTARY SOCIAL STATISTICS 


people—^into two or more groups by random selection. When 
any good method of randomization is used, all initial differences 
between the experimental and control groups should be accidents 
of chance.^ 

6. Pretests and Final Tests.—Whatever method of equaliza¬ 
tion is used, it is well before subjecting the groups to the condi¬ 
tions of the experiment to test them to see how much alike the 
experimental and control groups really are in pertinent respects. 
This is usually done by means of a pretest, which is the same 
as the final test that will be used at the end of the experiment to 
measure the differences between the groups at that time. Thus, 
in our illustration, we might set up a battery of questions about 
drinking habits that would enable us to decide to what extent 
a young person drank or was predisposed to drink, and if the 
experimental and control groups scored about the same on this 
test, we might regard them as equivalent for the purposes of our 
investigation. 

6. The Influence of Additional Factors.—It is often desirable 
to test the effects of a third factor on the relationship between the 
independent and dependent factors in an experiment. In this 
case, the third factor is inserted and removed, with only the 
independent and dependent factors present. Thus, we might 
observe the influence of sex in studying the influence of instruc¬ 
tion on drinking. Both control and experimental groups would 
then be divided by sex, giving four groups rather than two. 

7. The Case of Continuous Variables.—In the illustration 
above, we were dealing with attributes, such as instruction, 
*‘no instruction,^' ^‘habitual drinkers," ‘‘not habitual drinkers," 
rather than with measured variables, like the amount of instruc¬ 
tion and the amount of the tendency to drink. Although there 
is no difference in principle between the two cases, there is 
some variation in procedure. Thus, if we wanted to measure 
the amount of the tendency to drink in relation to the amount 
of instruction given, we should take several groups instead of 
only two. To each of the several groups we should give a 
different amount of instruction, including no instruction at all 

^ For a more advanced discussion of this subject, together with the statis¬ 
tical techniques of analysis of variance and covkriance that have recently 
been developed in connection with it, see E. F. Lindquist, Statistical Analysis 
in Educational Research^ Chaps. IV-VI, Houghton Mifflin Company, 
Boston, 1940. 



FACTOR CONTROL 


29 


to one group, and note whether there was any relationship 
between the increasing amount of instruction and the tendency 
to drink after two years. As before, we should have to equate 
the groups in all important respects before experimenting with 
them, or else be prepared to make corrections for the differences. 
Of course, we should have to devise scales for measuring the 
amount of instruction and the amount of the tendency to drink, 
before we could treat these factors as continuous variables. 

8. Interfering Variables.—As in the case of attributes above, 
it is usually important to measure the influence of certain inter¬ 
fering variables. In our drinking experiment, some of these 
might be the attitude of the parents toward drinking, the sub¬ 
jects’ ages, their money incomes, and so on. Such variables 
are not matched or randomized out of the experiment, but are 
introduced in varying known amounts, and their effects on the 
independent and dependent variables are measured. Factors 
may then be held constant, or their influence subtracted out, by 
mathematical methods.^ This type of analysis yields more 
information and information of a more practical kind than when 
all interfering factors are actually removed by matching or 
are equalized by randomization; and it is also generally easier to 
carry out. 

Exercises 

1. Illustrate the use of the actuarial technique in the prediction of 
success in marriage. 

2. Explain how you would obtain control over interfering factors in a 
study designed to show the effects of the presence of children on the 
divorce rate, or other problem of your choosing. 

3. Comment briefly on the following published statements: 

а. ‘^Despite marked advances in appendicitis diagnosis and surgery, 
Wisconsin’s death rate from the ailment, which stood at 11.6 deaths 
per 1,000 population in 1911, nevertheless increased to a rate of 18.2 
in 1930.”2 

б. “ Women Are Safer Drivers than Men Records Reveal: When Mary 
and Jack borrow Dad’s car for a ride, they’ll be smart if thev let Mary 
do the driving. 

'See, foy example, Mordecai Ezekiel, Methods of Correlation Analysis^ 
Chap. XIII, John Wiley & Sons, Inc., New York, 1930; or G. W. Snedecor, 
Statistical Methodsj rev. ed., Chaps. XII and XIII, Collegiate Press, 
Inc., of Iowa State College, Ames, Iowa, 1938. 

* Wisconsin State Board of Health Bulletin^ Madison, April-June, 1935, 
p. 26. 



30 


ELEMENTARY SOCIAL STATISTICS 


'*For in spite of the young man's claim to being a better driver, state 
highway commission records show that women drivers seldom are 
involved in fatal accidents. Young men, however, are involved in 
more fatal automobile crashes than any other age class of motorists. 

‘*Few women drivers are found on state highway commission fatality 
records, and only one person was killed in the last two years by a girl 
driver under 18 years of age. 

‘‘State safety workers won't argue that Mary is a better driver than 
Jack, but they do claim that state records indicate she is a safer driver." 

c. Homemaking Careers Attracting More Girls: In increasing number, 
girls are turning attention these days to homemaking as a career. 

“The popularity of homemaking courses is shown in the increasing 
enrollment in home economics at the University of Wisconsin where 
enrollment this fall is nearly 10 per cent above 1936, according to the 
director of the course." 

d. “There has been more social progress in the United States in the 
last 18 years since women have had the vote." 

6. “The Distilled Spirits Institute, demanding that the Anti-Saloon 
League recognize the prevailing downward trend of major crimes, bases 
its case largely on this general statement: The total (of all crimes) for 
the calendar year 1936 showed a decrease of 112,055 offenses as com¬ 
pared with 1935." 

(Turn in to the instructor two examples of the misuse of statistical 
reasoning clipped from newspaper or magazine.) 

References 

Burtt, E. a. : Principles and Problems of Right Thinking^ Part III, Harper & 
Brothers, New York, 1928. 

Chaddock, R. E,: Principles and Methods of Statistics^ Chaps, II and III, 
Houghton Mifflin Company, Boston, 1925. 

Chapin, F. Stuart: An Experiment on the Social Effects of Good Housing, 
American Sociological Review, Vol. 5, pp. 868-879, 1940. 

Dewey, John: Logic: The Theory of Inquiry, especially Chaps. XI and 
XXIV, Henry Holt and Company, Inc., New York, 1938. 

Fisher, R. A.: The Design of ExperimerUs, D. Van Nostrand Company, Inc., 
New York, 1935. 

Good, C. V., A. S. Barr, and D. E. Scates: The Methodology of Educational 
Research, Chaps. IX and X, D. Appleton-Century Company, Inc., 
New York, 1936. 

Goulden, C. H.: Methods of Statistical Analysis, Chaps. I and V, John 
Wiley & Sons, Inc., New York, 1939. 

Peters, C. C., and W. R. Van Voorhis: Statistical Procedures and Their 
Mathematical Bases, Chap. XVI, McGraw-Hill Book Company, Inc., 
New York, 1940. 

WoLP, A.: Essentials of Scientific Method, The Macmillan Company, New 
York (no date). 



CHAPTER IV 


THE STATISTICAL INQUIRY 

1. The Role of Nonquantitative Methods. —Access to non- 
quantitative methods, such as the historical method, the case 
study, and the general interview, is not to be denied the statistical 
investigator in sociology. Many of his problems and ideas will 
be suggested by working with materials of these kinds before 
the statistical study is set up. Also, during the progress of the 
collection of the statistical data and analysis of them, he will 
usually find it invaluable to interview or talk with the informants 
and their neighbors, to saturate himself with their points of 
view and backgrounds, and to judge the reliability of their 
replies to formal schedule questions by shrewd observation. 
Finally, as suggested in Chap. I, in interpreting his statistical 
findings, some important questions are almost certain to arise 
that cannot be answered from the figures in hand, and he will 
want to go back to the living situations for fresh suggestions. 
The statistical investigator is expected, however, to limit his 
formal conclusions to those arrived at by tested quantitative 
methods. 

2. The Problem.—The statistical problem in sociological 
research may vary from what is exploratory and merely fact 
finding to the testing of a sharply stated hypothesis, depending 
upon how much is already known about the subject. We may 
set up a study to find out anything we can about divorce in the 
United States, or we may limit the inquiry to testing the hypothe¬ 
sis that the occupation of the husband plays an important 
part in the situation. Exploratory or fact-finding studies 
should be regarded as merely preliminary to more specific and 
better controlled studies, because the former cannot penetrate 
beneath the surface of social phenomena. The problem should 
also be cut to fit the limitations of time, money, and personnel 
qualifications at the disposal of the investigator. It should 
usually be a problem of obvious theoretical or practical impor- 

81 



32 


ELEMENTARY SOCIAL STATISTICS 


tance, although a certain amount of research without apparent 
value but of interest to the investigator should be encouraged, 
because this kind of probing about has sometimes resulted in 
important scientific discoveries. The availability or lack of 
availability of reliable statistical data is another consideration 
that will affect the choice of a problem. This bears on the 
point that the problem must be capable of quantification or 
measurement. Above all, the problem should lie in the field of 
methodological and informational competence of the investigator, 
but as far as possible outside his field of personal bias. There is 
sometimes a conflict here, as when a Negro sociologist wishes to 
investigate the social conditions of the Negro race. He should 
know the field better for being a Negro, but he is likely to carry 
into the study a racial sympathy that may influence his findings. 
It is very desirable for an investigator to state frankly his biases, 
as well as to do his best to overcome them. 

Of course, no problem should be finally selected until it is 
known to what extent and by what methods it has already been 
studied.^ Although some investigations need to be repeated or 
done differently for confirmation, it sometimes happens that 
a problem has been very satisfactorily solved, and further work 
on it would be a waste of time. What is more likely is that 
certain angles of the problem have been worked out, but other 
angles remain to be investigated. The research worker is, 
therefore, guided by a knowledge of previous work into the 
most profitable channels for further study, and may obtain 
suggestions and warnings from what others have done. 

In dealing with a statistical problem of the more scientific 
sort, it is indispensable to state the problem as a formal hypothe¬ 
sis or hypotheses to be tested. Such a hypothesis should be so 
worded that the task of the investigator is made as easy as 

1 Aids in locating previous sociological research on a topic include the files 
of The American Journal of Sociology^ The American Sociological Review^ 
The Journal of Social Forcesj Sociology and Social Research, and Population 
Index; Social Science Abstracts (1929-1932); P. K. Whelpton, Needed Popula¬ 
tion Research, Science Press Printing Company, Lancaster, Pennsylvania, 
1938; The Psychological Index; Encyclopedia of the Social Sciences, E. R. A. 
Seligman, ed.. The Macmillan Company, New York, 1930; Poolers Index to 
Periodical Literature; Readers* Guide to Periodic Literature; Annual Magazine 
Subject Index; Book Review Digest; United States Catalog: Books in Print; 
Cumulative Book Index, 



THE STATISTICAL INQUIRY 


33 


possible. It is usually simpler to use a positive hypothesis than 
a negative one, and then to try to disprove rather than to prove it. 
Strictly speaking, we can never prove a general affirmative 
proposition because we cannot examine all possible cases; but 
a single exception may effectively disprove it. Thus we might 
take as a hypothesis, Any association found between the birth 
rate and the business index, with the marriage rate held con¬ 
stant, is due to chance errors,’’ and seek to show that in our 
particular sample it is not due to chance errors. We can only 
disprove, or fail to disprove, such a hypothesis. For practical 
purposes, however, we may regard as provisionally true any 
hypothesis that careful tests have failed to disprove. 

3. Secondary Statistical Data.—Research is a cooperative 
social enterprise, and the social investigator often necessarily 
uses data collected by someone else. The chief sources of 
secondary statistical data that are of interest to sociologists 
are the publications of the various bureaus and divisions of the 
Federal, state, county, and municipal governments, and a few 
private agencies.^ 

1 Important Federal agencies in the United States include the Bureau of 
the Census, the Division of Rural Life and Welfare of the Department of 
Agriculture, the Bureau of Agricultural Economics, the Bureau of Labor 
Statistics, the Children’s Bureau, Public Health Service, Works Projects 
Administration, Division of Vital Statistics of the Bureau of the Census, 
National Resources Committee, Interstate Commerce Commission, Central 
Statistical Board, Department of Commerce, Department of the Interior, 
Federal Bureau of Investigation, National Archives, National Youth 
Administration, Tennessee Valley Authority, Women’s Bureau, United 
States Employment Service, Immigration and Naturalization Service, 
Agricultural Adjustment Administration, Farm Security Administration, 
Office of Education in the Department of the Interior, Office of Indian 
Affairs in the Department of the Interior. A current summary of Federal 
agencies, their subdivisions and activities, is available in the United States 
Government Manual issued by the National Emergency Council. A general 
source for the purchase of Federal documents is the Superintendent of Docu¬ 
ments. All these agencies are located in Washington, D. C. 

Information about births, marriages, divorces, deaths, and the public 
health is published by state bureaus of public health or vital statistics, with 
offices in the state capitals. State bureaus of correction, departments of 
education*; departments of agriculture, departments of public welfare, plan¬ 
ning boards, tax commissions, and the like are important sources of data for 
students of social conditions. State and private universities and agricul¬ 
tural colleges also gather and interpret a great deal of information. The 



34 


ELEMENTARY SOCIAL STATISTICS 


Any serious statistical research project will, of course, soon lead 
far beyond any general summary of sources of data. Much of 
the success of the trained investigator depends upon his ingenuity 
and persistence in discovering the available data that are per¬ 
tinent to his problem. Intimate familiarity with the field of 
investigation is the best aid here. 

After secondary data are found, however, the investigator 
must examine them carefully and critically before he can safely 
use them for his special purpose. He needs to know (1) the 
definition of the thing that is enumerated in relation to his 
purpose, or (2) the definition of the whole that is measured and 
of the unit by which it is measured, (3) the exhaustiveness and 
mutual exclusiveness of the classification, (4) changes in the 
definition, (5) the extent of actual over- or underenumeration or 
measurement, (6) the date or period in time to which the data 
apply. 

A few examples may be of help. In the 1935 Census of 
Agriculture in the United States, a farm was carefully defined as 

... all the land which is directly farmed by one person, either by his 
own labor alone or with the assistance of members of his household, or 
hired employees. A ranch, nursery, greenhouse, hatchery, feed lot, 
or apiary is considered a farm. Establishments keeping furbearing 
animals or game, fish hatcheries, stockyards, parks, etc., are not con¬ 
sidered as farms unless combined with farm operations. 

The enumerator was instructed not to report as a farm any tract of 
land of less than 3 acres, unless its agricultural products in 1934 were 
valued at $250 or more. 


Brookings Institution of Washington, D. C., the National Bureau of Eco¬ 
nomic Research of New York, the Russell Sage Foundation of New York, 
the Scripps Foundation for Population Research of Oxford, Ohio, and the 
Gini Foundation of Palo Alto, Calif., are private organizations whose work 
is of value to social investigators. 

The latest copies of the Statistical Abstract of the United States^ published 
by the United States Department of Commerce; the Abstract of the Census 
of the United States, published by the United States Bureau of the Census; 
and the World AlmanaCj obtainable at most newsstands, are of frequent use. 
Bibliographies include those of Dorothy C. Culver, Methodology of Social 
Research: A Bibliography, and of A. F. Kuhlman, Public Documents, 

The League of Nations, the International Labor Office, and the Inter¬ 
national Institute of Agriculture publish much statistical material of world 
interest, available in public libraries. 




THE STATISTICAL INQUIRY 


35 


A farm may consist of a single tract of land, or of a number of separate 
tracts. These several tracts may be held under different tenures, as 
when one tract is owned by the farmer and another tract is rented by 
him. When a landowner has one or more tenants, croppers, or man¬ 
agers, the land operated by each is considered a farm. Thus on a 
plantation the land operated by each ‘‘cropper” or tenant was reported 
as a separate farm. The land operated by the owner or manager, by 
means of wage hands, was likewise reported as a separate farm. 

That this definition of a “farm” nevertheless did not suit the 
purposes of all users of the census appears from comments like 
the following: 

The census uses a concept of a “farm” which is an arbitrary statis¬ 
tical definition violating any sound reasoning from whatever standpoint 
we may choose. In counting farm operators the census makes no dis¬ 
tinction between the sharecropper on the one hand, and, on the other 
hand, the farmer who operates his property either personally or with the 
aid of a manager and the tenant who operates a farm—strange as it may 
seem, in current American agricultural statistics the plantation does 
not exist. Paradoxically enough, it lives statistically under the dis¬ 
guise of its direct competitor and adversary, the small family farm . . . 
nobody knows how many plantations existed in the United States in 
1920, 1925, 1930, or 1935.' 

A great many more farms were enumerated by the Census of 
Agriculture in 1935 than in 1930. Between these two censuses 
no change was made in the definition of a farm; yet there is 
evidence that the 1935 census counted as farms many plots 
that were not counted as farms in 1930, especially in or near 
mining and industrial areas. The depression and unemployment 
caused the occupants of these plots to give more than ordinary 
attention to gardening, chicken raising, and other home produc¬ 
tion, and as a result these rural home places were lifted into the 
farm class. Since the families and the plots were otherwise just 
the same as they had been in 1930, and the “farmers” added by 
the 1935 census were actually miners and industrial workers who 
would return to their usual employment at the first opportunity, 
it has been felt that the heavy increase in the number of farms 
reported was largely spurious. As usual, however, the error, 
if it may be so called, occurred on the periphery of the definition 

' Karl Brandt, Fallacious Census Terminology and Its Consequences in 
Agriculture, Social Reaearchf Vol. 6, pp. 19-37, 1938. 



36 


ELEMENTARY SOCIAL STATISTICS 


where the concept defined (a farm) shades off into something 
different (not a farm). Most of the farms added in the above 
manner were quite small, and the value of their products was 
so close to the minimum of $250 that they might easily slip 
in and out of the farm category. The number of farmers 
returned by the census of agriculture is never the same as the 
number found by the accompanying census of occupations. 

In the case of farm laborers, including members of the farmer^s 
family working on the home farm, the problem of definition is so 
difficult that not much reliance can be placed in the figures 
furnished by the census. In addition, the census of 1920 was 
taken as of Jan. 1 and that of 1930 as of Apr. 1, and this shift 
of date alone caused a sharp variation in the number of farm 
laborers reported. It is well known that the census of popula¬ 
tion undcrenumerates young children, Negroes, and other 
classes that for one reason or another are likely to be overlooked; 
that the reporting of the population by years of age overloads 
the 6^s and lO^s (e.g,, 15, 20), at the expense of the other years 
(e. g,y 14, 17, 19, 22); and so on. 

Such examples suggest only a few of the many pitfalls that 
lie in secondary data, even when collected by a great national 
agency like the Bureau of the Census, which may be regarded 
as unbiased and thoroughly honest in those aspects of its work 
that cannot be checked by the consumer of the data. The 
dangers are usually much greater in the case of data supplied 
by the smaller public agencies, like those of states or cities, and 
by many private agencies. The best rule is to insist, as far as 
possible, on knowing what was done by the collecting agency 
at each step of the data-gathering process, from definitions to 
field work to final tabulation; and on noting what checks they 
have applied to test the accuracy reliability, and validity of 
their data. Only when the investigator is reasonably satisfied 
after a painstaking scrutiny of this kind that the data are appro¬ 
priately defined and sufficiently accurate for his purpose is he 
justified in going forward with the work of analyzing and inter¬ 
preting them. Research workers have wasted months of effort 
and thousands of dollars before they discovered that the material 
on which they were basing their conclusions was hopelessly 
inaccurate to start with. Obviously, no amount of mathematical 
treatment can make amends for data of this kind. 



THE STATISTICAL INQUIRY 


37 


4. Primary Statistical Data.—The usual method of gathering 
firsthand data in sociological research is by means of the schedule 
or of the questionnaire. Both are sets of questions to be answered 
in blank spaces provided. The questionnaire is mailed out to 
informants and is not often to be recommended. Not only 
are the persons addressed likely to misunderstand or interpret in 
diverse ways the questions asked, but they seldom answer all 
of the questions, and many of them make no returns at all, 
thereby tending to produce a biased sample. A much sounder 
plan is to have trained interviewers with a schedule visit the 
persons who are to give the information,^ or transfer the data to 
the schedule from available records. The procedure properly 
begins with the formulation of the problem, and ends with the 
analysis of the data, because one step logically determines 
another, and a given investigation should be developed as an 
organic whole. 

6. The Schedule.—After the problem of fact finding or 
hypothesis testing and the general approach to it have been 
tentatively determined, the next step is normally to prepare 
the schedule. The schedule is nothing more than a4ist of the 
questions which it seems necessary to answer in order to test 
the hypothesis or hypotheses, or to get the facts at which the 
investigation is aimed. Much skill and labor are required to 
include all the essential questions and nothing more. Anything 
that is obvious or beside the point should be omitted. In 
addition, each question must be simple and clear, and must be 
answerable in terms of countable or counted units; and the 
same question should have approximately the same meaning for 
each informant. The units must be capable of objective defini¬ 
tion, so that there will be no serious amount of disagreement 
about specific instances. Birth rates, an index of business con¬ 
ditions, marriage rates, age in years or months, I.Q.'s, ‘'male,” 
“female,” “yes,” “no,” dollars, number of persons in family, 
occupation, and so on, are acceptable units when carefully 
defined in context. So much difficulty has been experienced with 
a term like “occupation,” however, that the census bureau has 
prepared a large manual with a detailed list of almost every 

c 

^ It is possible to mail out questionnaires to carefully stratified classes of 
the population, and to correct the replies in the light of answers obtained by 
personal visitation of much smaller samples from each stratum. 



38 


ELEMENTARY SOCIAL STATISTICS 


conceivable occupation, showing its schematic relationship to 
more inclusive occupational categories. 


THB ENUMERATIVE CHECK SCHEDULE 

SPECIMEN FORM EC-1 AND INSTRUCTIONS 

Printed below and iltchtly reduced from ite actual size, is a specimen copy of EC-1 with antriea made to niostrata 
typical situations. For the persons enumerated, these include: a fully employed head of family, a housewife, a part- 
time worker, a worker temporarily absent, an unemployed worker, a new worker, a full-time student, a retired invalid, 
and a worker on a special Government or emergency project. The specimen £<%1 Form as set out is followed by a 
narrative describing the manner in which an enumerator might receive the answers which are recorded on it. The 
instructions printea on the back of EC-1 are reproduced on the opposite page. 

^•o.t NATIONAL UNEMPLOYMENT CENSUS CONFIDCNTIAL 

EaoBerstloB ef Perseas Rcsldiat oa Selected PoetsI Routes 

_. — 

iStiiiM MS MiDUt, M funi rt SUI Mki4 iM auabw) 

____Eadti/wow.... 

(Cur, !<>■£. «• 

__ICiwaoMMtiA 

(Cwuirl (StMO 

B. Don this boutcbold kvs oa a fsrmt .Ep - 

(VMCf Ml 





Tlmmas E. Brown, the enumerator, begins his work on Monday, November 29, 1937. He has been instructed by 
the postmaster and furnished with a package of EC-1 Forma preaddresaed for each dwelling on the route to which 
he is assigned, as well as a supply ox EC-2 notices Ths Arat EC-1 Form bears the address 2102 North Lake Straat. 
It is not a farm, so he writas “No” in answer to “D”. Mrs. Johnson answers the bell, and when Mr. Brown has Intro¬ 
duced himself, explaining ths purpose of his call, ohs gives him tha following information about ths members of her 
household. 

There are 11 in alt, including 2 not vet 14 years old. Mr. Brown writes in "11” and "2” for "C” and "D”, reapee- 
tively. and proceeds to list the names of the 9 grown-ups, and then to All In the answers for each os Mrs. Johnson 
reap^s to the questiono. 

The head ef the heuae Is Philip Johnson, age 66. He was fully amployed during the week of November 14-20 at 
a regular Job. She is hia .wife, has always kept house for the family and does not want work for pay. Their oldest 
son, George, is 82. He has a regular Job but was put on a part-time basia in September and worked only 16 hours during 
tha week of November 14-20. Yes, he wants more. work. Helen who is 28 has a Job. She was out sick the week of 
November 14-20, but has since returned to work. Arthur, age 24, worked for several yean up to last aummer when 
be was laid off. He wanted a Job daring the week of November 14-20 and has been temporarily away from home'In 
another city to'lng to And one. Peter, age 20, has not worked before but he, too, la looking for a Job. Mary, age 17, 
it still In MhooL Paul Smith, age 80. is Mn. Johnson's father who Uvea with her. He gave up working leveral yean 
ago when hia health made it impoasibie for him to carry on. Robert Jenea, 24, is a roomer who is a laborer on a x^PA 
project. 

Mn. Jiduison also says that another family, the Smiths, live unstaira in the house. As ha does not have an addresaad 
BC-l W^ma. for tho Smiths, Mr. Brown Alls In a blank form for them and proceeds to his interview with Mis. Smith. 


A good rule is that each question in the schedule should be 
answerable either in terms of some standard unit like dollars and 
number of members in family (as defined), or in terms of a check 













































THE STATISTICAL INQUIRY 


39 


mark, code number, or letter that refers to a specific list. For 
example, after the interviewer learns the subject’s occupation 
he may enter in the schedule the code number of the appropriate 
classification in the census manual of occupations. Open 
questions, in answer to which any word or phrase may be inserted, 
should be avoided. Thus the question, ^^To what social organ¬ 
izations does he belong?” is usually less desirable than a com¬ 
prehensive list of social organizations to be checked, including 
the catchall ‘‘Others,” to cover any institutions that may have 
been omitted from the list. The ability of informants or 
records to furnish sufficiently accurate answers should be con¬ 
sidered. Questions that call for more information than is 
likely to be available, that rely too much on mempry or on 
memory of the distant past, that cause fatigue, or that excite 
bias or involve personal interests, either are to be avoided or 
special provisions are to be made to estimate, overcome, or 
correct for the resulting errors. Questions addressed to an 
informant should also be inspected to see if they suggest their 
own answers (e.g,, “Do you dislike to go to school?”). The 
schedule should not modify the behavior it is intended to meas¬ 
ure. Special care should be taken that the schedule is not so 
long as to weary or disgust the informants. If it has to be long, 
more than one interview should be allowed, and the informant 
should be paid or otherwise made to feel that the time given to 
it is worth his while. 

On page 38 is the enumerative check schedule used as a part of 
the National Unemployment Census of 1937. Its purpose was not 
to test any hypothesis, but merely to check the number of unem¬ 
ployed persons enumerated by the voluntary registration plan. 
It meets all the requirements mentioned above, except that it 
employs a number of questions such as “ Docs he usually work for 
pay?” the clarity and meaning of which are not obvious. 

The Enumerative Check Schedule 

INSTRUCTIONS 
Household Information 

A, Location, —Give address fully, including apartment number, floor 
number, rear, alley, etc., if necessary to identify the household. 

B. Does this hmsehold live on a farm? —Consider as a farm any tract of 
land locally so regarded. 



40 


ELEMENTARY SOCIAL STATISTICS 


C. Total number of persons in this household, —Include all persons living in 
the same household unit, including servants and lodgers, also children and 
others temporarily away from this household. 

D, Number less than 14 years of age. —Enter total number of persons in 
this household who are less than 14 years of age. 

Questions About Each Person 

Name. —Before making the entries in any other column, list the names of 
all persons 14 years of age and over, then check with items ‘‘C** and “D,'' 
above, to account for every person in the household. 

Write each name on a numbered line; never crowd additional names 
between lines or at bottom of form. For households with more than ten 
members 14 years of age and over, continue the listing on a second form, 
repeating the address. 

Column 1. Sex. —Enter “ M ” for male and ** Ffor female. 

Column 2. Color or race. —Enter for white, *^Neg*' for Negro, and 

“O” for other. Enter persons of Mexican parentage as “white'' (W). 
'The “other" (O) group includes Indians, Chinese, etc. 

Column 3. Age at last birthday. —If the exact age is not known, enter the 
approximate age. 

Column 4. Was this person working for pay (or profit) during the week of 
November 14 to 20, 1937?—Enter “Yes" for each person who worked for pay 
(salary, wages, fees, commission, supplies, living quarters, etc.) or who 
worked for profit (in his own business, store, or on his own farm) at any time 
during the week of November 14-20. Enter “Yes" for each part-time 
worker, even though he worked only a few hours each day, or only a few 
days of that week. 

Enter “No" for each person who was NOT working for pay or profit, as 
defined above, at any time during that week. In addition to persons who 
were totally unemployed, “No" should be entered for the following classes 
of persons: 

a. Housewives and other unpaid persons engaged only in housework or 
helping without pay in a family business or store or on the family farm. 

h. Sons, daughters, or other relatives who, without pay, help some mem¬ 
ber of the household in his work for pay or profit. 

c. Full-time students, and retired or disabled persons. 

d. Persons who had jobs but who were temporarily absent from work 
during the entire week because of sickness, strike, vacation, or other similar 
reasons. 


6. The Instructions.—To deal adequately with the definition 
of the terms used in a schedule, it is customary to accompany 
the schedule with a set of instructions, like those that follow 
the check schedule of the National Unemployment Census on 
page 39. A reading of these instructions will give an idea of 
the extent to which they may improve the accuracy of the 




THE STATISTICAL INQUIRY 


41 


returns. In work of this kind there is, of course, always a 
practical limit beyond which the matter of definition cannot be 
carried. 

7. The Tables.—It is usually impossible to set up a schedule 
with much confidence unless tables to receive the returns are 
made up at the same time. Just what summary statistics are 
wanted should be listed (c.gr., means, proportions, correlation 
coefficients), and the tables needed to compute and exhibit 
them drawn up, together with a transcription sheet or cards to 
which all of the data will be transferred from the schedules. 

Three of the many tables that were used in connection with the 
enumerative check schedule of the National Unemployment 
Census are shown below. 

Table 2.—Persons Enumerated in Check Areas as Partly 
Unemployed or as Part-time VC^orkers, by Sex and Hours 
Worked during the Week of Nov. 14-20, 1937* 

(Data for persons 15-74 years of age) 



Partly unemployed 

Part-time workers 

Total 

Male 


Total 

Male 

Female 

Total. 

84,919 

60,944 

23,975 

20,895 

11,986 

8,909 

Reporting. 

82,898 

59,438 

23,460 

12,388 

6,538 

5,580 

None. 

105 

72 

33 

23 

14 

9 

1-8 hours. 

8,268 

4,848 

3,420 

1,193 

434 

759 

9-16 hours. 

20,499 

13,899 

6,550 

2,636 

1,211 

1,425 

17-24 hours. 

30,195 

22,137 

8,058 

3,747 

1,982 

1,765 

25-32 hours. 

18,120 

14,028 

4,092 

3,099 

1,808 

1,291 

33-40 hours. 

4,896 

3,813 

1,083 

1,303 

849 

454 

41 hours or more. 

865 

641 

224 

387 

240 

147 

Not reporting. 

2,021 

1,506 

515 

8,507 

5,448 

3,059 

Per cent reporting. 

100.0 

100.0 

100.0 

100.0 

100.0 

100.0 

None. 

0.1 

0.1 

0.1 

0.2 

0.2 

0.2 

1-8 hours. 

10.0 

8.2 

14.6 

9.6 

6.6 

13.0 

9-16 hours. 

24.7 

23.4 

27.9 

21.3 

18.5 

24.4 

17-24 hours. 

36.4 

37.2 

34.3 

30.2 

30.3 

30.2 

25-32 hours. 

21.9 

23.6 

17.4 

25.0 

27.7 

22.1 

33-40 hours. 

5.9 

6.4 

4.6 

10.5 

13.0 

7.8 

41 hours or more. 

1.0 

1.1 

1.0 

3.1 

3.7 

2.5 

Median... . 

19.3 

19.9 

17.7 

21.0 

22.5 

19.3 


*From Dbdrick and Hanbsn, Final Report on Total and Partial Unemployment, 1937» 
Vol. IV, p. 31, The Enumerative Check Census, Census of Partial Employment, Unemploy¬ 
ment, and Occupations, United States Government Printing Office, Washington, 193a 






















42 


ELEMENTARY SOCIAL STATISTICS 


Tablb 3.—Persons Enumerated in Check Areas as Not Available 
FOR Employment, by Sex, Usual Work Status, Desire for 
Work, and Ability to Work* 

(Data for persons 15-74 years of age. Percentage not shown where less 

than 0.1) 



Both 

sexes 

Male 

Female 

Num¬ 

ber 

Per 

cent of 
popu¬ 
lation 

Num¬ 

ber 

Per 1 
cent of 
popu¬ 
lation 

Num¬ 

ber 

Per 

cent of 
popu¬ 
lation 

Total not available for em- 







ployment. 

608,460 

41.5 

102,991 

14.2 

505,469 

68.3 

Wanting but not actively 







seeking work. 

21,108 

1.4 

9,222 

1 3 

11,886 

1.6 

Usually work. 

14,082 

1.0 

7^491 

1.0 

6^591 

0.9 

Do not usually work. 

7,026 

0.5 

1,731 

0.2 

5,295 

0.7 

Wanting but unable to 







work. 

3,471 

0.2 

2,264 

0.3 

1,207 

0.2 

Usually work. 

2,668 

0.2 

1,868 

0.3 

800 

0.1 

Do not usually work. 





407 


Not wanting and do not 



■Hm 




usually work. 


39.8 


12.6 

492,376 

66.5 


* From Dbdrick and Hansbn, Final Report on Total and Partial Unemploymentt 1937, 
Vol. IV, p. 33, The Enumerative Check Census, Census of Partial Employment, Unem¬ 
ployment, and Occupations, United States Government Printing Office, Washington, 1938. 


As a result of constructing specific tables, the original schedule 
is likely to be considerably amended and improved, especially 
if a complete set of tables is made covering every important step 
in the treatment to which the data are to be subjected, including 
all work tables for the statistical analysis. 

8. Testing the Schedule.—^After a schedule has been tenta¬ 
tively constructed, it should be tested for accuracy, reliability, 
and, if necessary, for validity. This applies to each question 
separately and to the schedule as a whole. 

Accuracy may be checked by applying the schedule to known 
data, and noting how closely the returns agree with the a priori 
information. The interviewer employed should have no prior 
knowledge of the data, and should not be aware that a check is 
being made. It is also sometimes possible to include in the 
schedule pairs of questions that get the same information in 















THE STATISTICAL INQUIRY 43 

independent ways; but this is usually confined to a few of the 
most important but least reliable questions. 

Table 4.— Gainful Workers, 1930, and Persons Employed or AvaiI/- 
ABLE for Employment in Enumerative Check Areas, 1937, by Sex 
AND Race as Percentage of Population* 

(Data for persons 15-74 years of age) 


Year 

Both Sexes 

Male 

1 

Female 

All races 

White 

Negro and 
other races 

All races 

White 

Negro and 
other races 

OQ 

0> 

§ 

tm 

White 

Negro and 
other races 

Gainful workers, 1930t 

57 

56 

66 

87 

87 

91 

25 

23 

41 

Employed or available 










for employment. 










1937. 

59 

58 

68 

86 

86 

87 

32 

30 

50 


* From Dbdrick and Hansen, Final Report on Total and Partial Unemployment^ 1937, 
Vol. IV, p. 35, The Enumerative Check Census, Census of Partial Employment, Unem* 
ployment, and Occupations, United States Government Printing Office, Washington, 1938. 
t Data derived from Fifteenth Census of the United States, Population, Vol. V, p. 117. 

Reliability is measured by trying the schedule twice on essen¬ 
tially the same data and comparing the results. It is often 
impractical to apply the schedule more than once to the same 
informant without introducing the memory factor or causing 
an undesirable response. Probably the best that can then be 
done is to apply the schedule to two random samples from the 
same universe of informants, and compare the returns. The 
same interviewer or interviewers should be used in each case. 
In all such tests, the differences observed should fall well within 
the range of random sampling error. ^ 

A schedule, a part of the schedule, or one or more questions 
in the schedule, need to be tested for validity when it is not clear 
that they measure what is intended to be measured. This is 
invariably the case when broad concepts are involved. For 
example, if a schedule is designed to discover the number of the 
unemployed’' in the United States as of a certain date, it is 
advisable *to give careful consideration to the matter of validity. 
Whenever a recognized and proved scale for the same purpose 

1 See Chap. XII. 


















44 


ELEMENTARY SOCIAL STATISTICS 


already exists, all that is required is to find the amount of agree¬ 
ment between the returns from the two instruments, as used 
on the same data. As a rule, however, this convenient situation 
does not occur: there is no true criterion by which to test the new 
instrument. 

In many cases the proper approach is simply that of finding 
an acceptable definition. With the help of anticipated users of 
the research, the investigator defines (1) what ‘‘area'^ of meaning 
of a term (e.gr., the ‘‘unemployeds^ should ideally be measured, 
(2) what parts of this area it is practicable to measure reliably 
enough for the purposes of the inquiry,^ and (3) what parts it is 
not feasible to measure. The meaning that should ideally be 
measured is the meaning that it is wanted to measure. The 
investigator then tries to find objective and reliable indexes, 
which, by agreement, cover as much of the desired meaning as 
possible. The remaining part that is not covered should then be 
clearly recognized by the investigator and his public, and both 
should regard the omission as not serious enough to invalidate 
the study. Of course, the public may sometimes be the investi¬ 
gator's scientific colleagues, sometimes social welfare agencies, 
and sometimes the general public. Or the investigator may 
merely interpret the interests of the public as he thinks best. 

It will frequently happen that the persons representing the 
consumers of the research will differ in what they want measured. 
In such a case, the choices are (1) to try to include all the desired 
parts of the meaning in a single index, (2) to use separate indexes 
for different parts of the meaning, or (3) to omit some parts of the 
meaning, and thereby reduce the number of people who will be 
satisfied with the results. 

One advantage of the method of setting up an inclusive or 
ideal definition of the area of meaning to be measured and then 
marking out how much of it the given instrument can reasonably 
be expected to measure is the fact that it may be possible in 
later studies gradually to expand the area measured until con¬ 
sumers finally agree either that the result is a satisfactory index 
of the total meaning, or that the part omitted is so intangible 

^ The Census of Partial Employmenlj Unemployment, and Occupations: 1937, 
whose schedule is shown above, included persons totally unemployed and 
warding work, emergency workers on WPA, NYA, CCC, etc., and persons 
partly employed. 



THE STATISTICAL INQUIRY 


45 


and so little agreed upon that it can be disregarded. It is also 
better to know approximately what the instrument does and 
does not measure, f.e., how useful it is for its purpose, than to 
say merely that ‘‘it measures what it measures!^' 

If the method of cooperative definition has not been used, or 
if its results are not entirely satisfactory, the question still 
remains whether the index measures what it is wanted to measure. 
Even in the more objective and simple instances, this is not 
always certain. Thus, if we are trying with Thorndike to 
measure the desirability of cities as places of residence,^ and 
include urban death rates as an objective index covering one 
aspect of the concept of desirability, we shall need to ask if the 
rates have been standardized for differences in the age and sex 
composition of the city populations, if the out-of-town deaths 
occurring in local hospitals have been omitted, and so on, before 
we can be sure that the rates reflect differences in the incidence 
of fatal diseases and accidents between cities. In cases like this, 
the validity may be taken as established when our several 
questions are properly answered. But in dealing with less 
objective traits, this may not be enough. Suppose we include 
an attempt to measure the subjective trait of “friendliness” as a 
further element in the desirability of cities as places of residence. 
By the method of the cooperative definition outlined above, 
we may arrive at a combination of the average number of social 
visits and the percentage of the population belonging to social 
organizations as a tangible index of this subjective quality. A 
potential consumer of the investigation who has been consulted, 
however, may say that he has lived in several cities and found 
the people in some much “colder” to newcomers than in others, 
and he doubts that the index will show this difference. If the 
consumer from personal experience can classify certain cities as 
“colder to newcomers” than others, we can apply our index of 
friendliness and see where it places them. If the results are in 
agreement with his observation, he is likely to accept the index. 
Of course, in such cases, the experience or opinion of a single 
individual is not enough. We should actually need to have many 
persons, representative of our public, rate or score a group of 
cities in ^regard to “coldness to strangers,” and compare their 

^ E. L. Thorndike, Y<mr City, Harcourt, Brace and Company, Inc., 
New York, 1939. 



46 


ELEMENTARY SOCIAL STATISTICS 


ratings or scores with the results of our index of friendliness, 
in doing this, we should be careful to choose as raters individuals 
who are well acquainted through actual residence with at least 
some of the cities in question. 

Moreover, the ratings when repeated by the same or like 
groups should give essentially the same results. If the several 
raters show little agreement among themselves, as may happen, 
no criterion at all will result from this procedure. In that case, 
we may need to face the problem of the average. Probably a 
certain city was actually ‘‘cold'' in its treatment of some of the 
raters and not of others. We might then have to devise a 
reliable score that would reflect the proportion of the raters who 
regarded the city as “cold," or the amount of “coldness" that 
they experienced there on the average, and relate it to our index. 
Or we might feel it advisable to stratify our raters by socio¬ 
economic classes {e.g., rich, average, poor), and get separate 
ratings from each class. The latter plan would require us to 
deal with the whole problem of the desirability of cities as places 
of residence from the point of view of each social class separately, 
which should provide a set of indexes of more value than any 
single index representing a gross average for all classes. In 
addition to subjective ratings, we might also set up, preferably 
by agreement, certain objective criteria of friendly or unfriendly 
cities, such as their methods of dealing with unfortunates, that 
are not included in our index, and test the latter against them. 

The final test of such an index, of course, is whether in practice 
it proves more useful than other methods in selecting cities that 
people will actually find desirable or undesirable places of 
residence, in accordance with the prediction of the score card. 

Ingenious ideas can often be used in testing the validity of 
an index. For example, if we are measuring attitude toward 
religion, we might see if our scale will place a group of ministers 
at the favorable end, a group of atheists at the unfavorable end, 
and average citizens for the most part in the middle. In work 
of this sort there are, however, many pitfalls that can be learned 
only from experience. 

Such tests of accuracy, reliability, and validity as mentioned 
above imply that the schedule will be tried out in the field on a 
small scale and carefully revised in the light of the results before 
it, the instructions, and the tables are put in final form. This 



THE STATISTICAL INQUIRY 


47 


preliminary trial almost invariably leads to some important 
changes, and should rarely be omitted from the routine of 
statistical research. 

9. The Interviewer. —After the schedule has been carefully 
prepared and tested, the purpose of the interviewer or data taker 
is merely to see that the questions are understood and answered 
to the best of the ability of the informant, or that the right data 
are accurately copied from the proper sources. The less the 
interviewer says or does beyond this, the more dependable the 
returns should be. He must be especially careful not to suggest 
answers to the informant, or to bias him in any way. While 
this may seem to be a negative role, it is one that calls for skill 
and judgment. The ability to induce informants of various 
kinds cheerfully to give accurate information, or to extract data 
without error from complex or confused records, is not common. 

It is often desirable to test the results obtained by each 
interviewer by noting whether an interviewer's returns differ too 
much from those of others reporting similar data. Also, when 
the schedules are edited, certain kinds of errors made by the 
interviewers may be noted. The interviewers may then be 
cautioned, or their work may be corrected for the personal 
equation. 

In the gathering of information by schedule, several interview¬ 
ers or clerks may be supervised by a foreman, or the investigator 
may do all this work himself. In any case, the investigator 
should participate in the actual field or library work at least 
enough to acquire a firsthand knowledge of the conditions under 
which the data were obtained, and a ‘^feeling" for the data, as it is 
termed. Many an investigation has been saved or lost by the pres¬ 
ence or absence of the analyst during the data-collecting process. 

10. Editing the Schedules. —The schedules filled out during 
each day on a large study are generally sent in to a group of 
editors at the headquarters of the study. Under the direction 
of a chief, these clerical workers look for unfilled spaces, for 
inconsistent answers, and the like, on each schedule. Where 
necessary, a defective schedule is returned to the field or library 
foreman, who in turn hands it to the interviewer or the clerk 
whose initials appear on it. In small studies, the schedules 
taken during the day are often edited by the interviewers them¬ 
selves each night. 



48 


ELEMENTARY SOCIAL STATISTICS 


11. Tabulation of the Data. —Edited schedules go to tabula¬ 
tors, who tally the data from the schedules to the tables, or, 
if machine methods are used, punch them on cards according to a 
code arranged for the purpose. Machines are, of course, faster 
and more economical for large-scale tabulation. 

The chief electrical machines now in use are the card punch 
and verifier, the sorting machine, and the tabulating machine. 
A general idea of what each does may be obtained from the 
following description: 


The first step in the use of a card for a particular record is the desig¬ 
nation of groups of columns as ‘‘fields.'' Each field defines a section of 
the card in which one particular type of information will always appear. 

The illustration following (Fig. 3) shows an 80-column card partly 
drawn up into fields. Each field is assigned a sufficient number of 
columns to include the largest number of digits which it will be called 
upon to accommodate. 






llllltlttlllllfttitl 
tllllltllllftllflffl 

itsiifissniiiititii 


iitininn 

titjttiiitt 

31111SSI11I 


f tiiininiiitiii tttnif If nil iiiitiit 
tflttl22l22t2tt2t2l2222ttt22}tt2}2tttt 
SSllSSI}lllSlllll2Stfllll33ll21l3ISIIt 


:n 


4444444444444444444 

sssnsssisssisissss 


44444444444 

SSIISISSISS 


44444444444444444444444444444444444444 

SSIISISIISISISiiSISiillSSISSSSIIIillll 


iiiniiiiJimiiiiii 

■aaoiQ -- 


MTfllllllf 

tllltllltll 


11 Ilium mminmnimiim 11111 

lltnillS 3 31IIII1111111111111111111 III 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 


aiJa,4M 


Fig. 3.—Eighty-column tabulating card. 

For instance, the greatest number of months is 12 (a two-digit 
number), therefore, two columns are sufficient for recording this informa¬ 
tion. The greatest number of days in a month is thirty-one, thus this 
field too requires only two columns. The year is indicated by the last 
two digits, making two more columns necessary, etc. 

Figure 4 illustrates (a 45-column) card completely laid out for a 
specific job, in this instance a complex (criminological) study. 

At this point it is obvious that all pertinent information must be 
registered in the card in the form of punched holes. The perforation 
of these holes is a simple matter. The digits of the numbers to be 
transcribed correspond to the digits printed on the card. Thus, to 
show the date Oct. 15, 1934, on the card illustrated above (Fig. 3), 





















































































































THE STATISTICAL INQUIRY 


49 


the card is perforated as follows: 10-15-34. Descriptive information, 
such as the names of persons or products, is generally coded numerically. 
Tabulating cards are perforated by means of an electric punching 
machine. The punch designed for the numerical system has a keyboard 
consisting of twelve keys, one for each punching position of a column. 
As a key is depressed a hole is cut and the card advanced automatically 
to the next column to be punched. The automatic features of the 
machine and the simplicity of the keyboard make the transcription of 
written data into punched-hole form easy, rapid and efficient. 


B 

■ 

I 

1 

1 

i 

0 0 

§ 

0 0 

0 0 

” 

£ 

i 

Q 

0 0 

1 

1 

j 

>1 MARITAL STATUS | 

I 

Ui 

N 

0 

1 

§ 

1 

« 

§ 

0 


UJ 

Z 

5 

ITS 

Eg 

^1 

(TS 

\ 

5 

li 

2i/> 

unr 

32 

il 

1 

SIS 

uJ 

0. 

1 

s 

■jf® 

§ 

UJ 

fi! 

z 

8 

ul 

< 

15 

Q 

ki 

>• 

0 

a 

z 

Jd 

i 

9 

ji 

z 

w 

ec 

a 

_i 

z 

o 

u 

o 

d 

z 

'O 

< 

< 

(/> 

0 

E 

X 

2 

z 

a 

u 

i 

lA 

■g 

1 

T 



1 

r 1 

1 1 

t 1 

1 

1 I 

1 

1 

1 

1 

1 

1 

s 

1 

t 


1 1 

1 1 

1 

1 1 

1* r 

1*1 

1* 1* 

1 

1 

1 

\ 

• 

1 

I 

r 



y 


2 2 

2 2 



Q 

n 

n 

2 

2 

Q 

2 

2 

2 


2 2 

2 2 

2 

2 Z 

2 2 

2 2 

2 2 

2 

2 

2 

2 



2 

2 



g 


a $ 

6 S 



3 

Q 

n 

a 

a 

a 

a 

a 

a 


3 a 

a a 

a 

3 3 

3 a 

a a 

a 3 

3 

3 

F 

3 



a 

1 





*4 4 

4 4 



4 

* 

4 

4 

4* 

4 

4 

4 

4 


4 4 

4 4 

t 

4 J 

V V 

44* 

4 4 

4 

4 

r 

4 



1 

D 

n 


Q 


6 6 

9 6 



1 

a 

S 

9 

6 

9 

9 

5 

9 


S6 

S 9 

3 

9 9 

9 3 

9 9 

9 9 


9 

f 

8 



9 

B 

6 6 6 

6 

E 


6 6 

6 6 



B 

□ 

Q 

6 

6 

6 

B 

6 

a 

6 

a 6 

6 8 

6 

6 6 

T 6 

6 u 

3^ 

6 

6 

iF 

IB 



6 

o 



• • 






- 

- 

.. 




•• 



- 

. j 









r- 





•• 

in 

y 

B 

^ 'i 

7 7 

7 7 



D 

B 


7 

7 | 

B 

B 

7 

7 

B 

7 7 

7 7 

7 

7 

7 7 

7 7 

7 / 

7 

7 

7 

7 



7 

m 

m 

n 

1 

a 6 

8 a 

a 8 

a 

e a 

Q 

a 

a 

a 

al 

1 

a 

a 

8 

8 

8 8 

8 8 

8 

8 8 

8 a 

a 8 

8 0 

8 

8 

D 

8 



8 

B 

D 




9 9 

9 9 

9< 

•1 

9 9 

1 

i 


9 

& 

5 

h 

! 


E 


EEj 

E 

u 

EE 

9 9 

» It 

!i 

9 9 

M 1' 

i 

9 

1 

3 



9 

•• 

Lj 

0 


Fia. 4.—Forty-five column card with field headings. 

When punching has been completed, the cards are usually in miscel¬ 
laneous order. The next step is to arrange them in sequence by some 
desired classification—that is, to group them according to some infor¬ 
mation which is punched in them. The Card-operated Sorting Machine 
is used for this purpose. 

The operation of the Electric Sorting Machine is based on the posi¬ 
tion of the punched hole in a vertical column of the card. As the cards 
pass through the machine a brush contact is made through the hole, 
causing an electrical circuit to be closed. This momentary circuit causes 
the card to be directed to a receiving pocket which corresponds to the 
position of the punched hole. For example, a card punched *‘9’^ in 
the column under consideration is directed to the 9 pocket, a card 
punched ^‘6’’ in the same column is directed to the 6 pocket, etc. . . . 

The automatic sort is made on one column at a time. It is apparent, 
therefore, that to arrange a group of cards in numerical sequence accord¬ 
ing to the data punched in a three-column field, the group is passed 
through the sorting machine three times. The sort is made first on the 

































































60 


ELEMENTARY SOCIAL STATISTICS 


units column, then on the tens column and finally on the hundreds 
column. The Card-operated Sorting Machine is entirely automatic 
and operates at a speed of 400 cards per minute. 

The third step in the Punched Card method is the automatic com¬ 
pilation of the data into printed reports. This is accomplished by the 
Electric Tabulating Machine which is a combined adding, subtracting 
and printing machine. Punched cards passing through this machine 
actuate the various adding counters and printing mechanisms—again 
by means of electrical contacts . . . The machine is entirely automatic, 
operates at a speed of 150 cards per minute . . . ^ 

12. Analysis of the Data. —The analysis of the data should 
proceed along the lines laid down in planning the study, although 
any minor modifications or extensions that later appear advisable 
may be made. This means that the data have already been 
put in work tables for computing means, percentages, standard 
deviations, correlations, or whatever other statistics are needed 
for simplifying and interpreting the findings. After these 
statistics have been worked out and their accuracy has been 
carefully checked, the investigator should state the results as 
simply, briefly, and clearly as he can. Only a few of the most 
vital tables should be presented with the text of the report, all 
others that seem desirable being placed in an appendix. Where 
graphic devices promise to be effective, they can be introduced. 

Perhaps the most important things to keep in mind at this 
crucial stage of an investigation are to limit the conclusions to 
what the data show, while yet seeking to use enough imagination 
and insight to discover all of the pertinent information that may 
be extracted from the findings. There are, of course, no rules 
by which this can be done. Everything depends upon the 
ability, integrity, training, and persistence of the analyst. 

13. The Amount of Error of Observation or Record in Sta¬ 
tistical Results. 2—The readers of sociological studies are not 
unreasonable when they express an attitude of skepticism 
toward the elaborate precision of some of the statistical tech¬ 
niques that are frequently applied to social data of doubtful 
character. 

^ Herbert Arkin, in G. W. Baehne, ed.. Practical Applications of the 
Punched Card Method in Colleges and Universitiesf pp. 4-8, Columbia Univer¬ 
sity Press, New York, 1935. 

* Adapted from a paper by T. C. McCormick, On the Amount of Error in 
Sociological Data, American Sociological Reviews Vol. 3, pp. 328-332, 1938. 



THE STATISTICAL INQUIRY 


61 


The major diflBculties involved in the estimation of errors of 
observation are practical rather than mathematical and theo¬ 
retical in nature. Determination of the accuracy of findings is 
first a question of funds and time, and is tied up with administra¬ 
tive policies. 

In England, the importance of estimating errors of record in 
sociological results has been recognized by Arthur L. Bowley, 
who writes: 

If we do not know of the existence of biassed errors, which in reality 
pervade our estimates, there is no remedy; if we know them, we are 
likely to obtain more accuracy by the most erroneous corrections for 
them than by neglecting them. ... In the nature of things, when we 
are dealing with errors we do not know their magnitude; the most we 
can know is their probable and possible extent. We might estimate, for 
instance, the percentage of unemployed in a certain year as 4.5, and 
add, from information in our possession (coming from a study of wage 
bills or the reports of relief agencies), that we considered this to be 
within .5 of the fact; we should then write the number 4.5 + .5, meaning 
that the error in the estimate as defined above was unlikely to be 
more than .5/4.5 = |, or 11 per cent, the corresponding absolute error 
being .5. In such a case we can also give definite limits. The per¬ 
centage employed must lie between 0 and 100; and if we could actually 
enumerate 1 per cent of the working-class as out of work, and also 92 
per cent as in work, we should know that the number required was 
between 1.0 and 8.0 per cent, and the maximum error in our estimate, 4.5, 
was 3.5/4.5 = J, or 78 per cent. Even this is more precise than the 
original statement, ‘Hhe percentage is 4.5, error unknown.” By further 
investigation we might perhaps bring the limits of error nearer to each 
other, and decide that it was practically certain that the percentage 
required was between 3.5 and 4.5; then we ought to say ‘*the number 
unemployed is .04 . . . of the working class, the estimate being correct 
to the last figure given.” This statement is of the same nature as, ‘‘The 
body weighs 15 lb. 3 oz., correct to an ounce.”^ 

As yet, most of the theory underlying the subject of errors 
consists of a number of precautions that simply need to be borne 
in mind and observed. What seem to be the outstanding points 
are briefly summarized below. 

' A. L. Bowley, Elements of Statistics^ 6th ed., pp. 180, 181, 192, Charles 
Scribner’s Sons, New York, for P. S. King & Son, Ltd., London, 1937. 



52 


ELEMENTARY SOCIAL STATISTICS 


(1) By definition, ‘‘The relative error in an estimate is the 
ratio of the difference between the estimate and the true 
value, to the estimate/’ 

(2) Where the necessary a priori information exists, the 
results of an investigation may be compared with expecta¬ 
tion, and the extent of the error suggested in this way. 
The basis of the expectation must, of course, be justified. 

(3) In the absence of adequate comparative data, the only 
possible method of finding errors of measurement or 
record is to repeat the measurements, or a sufficient propor¬ 
tion of them. These check measurements may be made 
with the same measuring instruments, or by other devices 
and approaches, to reveal possible errors due to a particular 
method or scale. A change of personnel to find the 
amount of error attributable to the “personal equation” 
is also important. 

(4) Where differences between the original and the check 
measurements are found, investigation should continue 
until it is possible to correct the error sufficiently for the 
purpose in hand by averaging or other estimate. 

(5) There are two well-known kinds of error of measurement 
or record, whose treatment is different: 

а. Unbiased or compensating errors. Some errors occur 
in opposite directions, and so wholly or partly cancel 
out in sums, averages, and other statistics. Such 
random errors, however, increase the value of the 
standard deviation and attenuate the correlation 
coefficient.^ 

б. Biased errors, or errors in the same direction: 

(а) Constant error. An error that remains the same 
from one measurement to the other, as when a foot 
rule is inaccurately divided, is usually hard to 
detect, but very common. In social investigation 
it may be due to wishful thinking, to loose definition, 
to falsification on the part of the subjects inter¬ 
viewed, and so on. 

(б) Accumulative error. Some biased errors increase 
from measurement to measurement, as when one 

^ See Chaps. VIII and X for definition of these terms. 



THE STATISTICAL INQUIRY 


53 


is dealing with more and more difficult material. 
Thus, in taking the census, it is less easy to get 
accurate answers to certain questions from Negroes 
than from whites. 

(c) Irregular noncompensating error. When measure¬ 
ments vary erratically, so that they affect sums 
and averages in important but unpredictable ways, 
the error must be estimated or eliminated in each 
separate measurement. 

Apart from ingenuity and perseverance, there is no formula 
for finding such errors as these. Where they are suspected 
but not discoverable, it may be advisable to express results 
in the form of ratios, since biased errors are reduced in ratios and 
index numbers. As Bowlcy puts it, '‘The error in a ratio is 
approximately the difference between the errors in its two 
terms. . . . 

In social investigation it is especially important to avoid 
misleading accuracy of statement, such as carrying calculations 
based on crude data to two or three decimal places. The problem 
of how far not to carry significant figures should invariably be 
solved on the conservative side, as when, in rough population 
estimates running into the millions, even the tens of thousand 
places are given to zeros, and the hundreds of thousands are 
rounded off. 

The final statement of an average or other statistic should 
include the maximum amount by which it may reasonably be in 
error, expressed as a percentage of the value of the statistic, 
as already mentioned above. For example, given the annual 
church attendances per individual, 58. The error of record in 
this figure is estimated to be 10 per cent. These facts may be 
expressed in some such form as 58 ± 10%. 

As Bowley warns, it sometimes takes longer to estimate the 
approximate amount of error in the results of a study than it 
does to make the study itself. If sociologists give proper 
attention to the accuracy of their findings, therefore, they are 
certain to be forced by the interests of economy of time and 
money to simplify their problems and to investigate the same 
population as often as feasible. This is true if accuracy is 



64 


ELEMENTARY SOCIAL STATISTICS 


regarded as a purely relative thing, which need be no greater 
than is required to obtain a satisfactory answer to a question in 
hand. 


Exercises 

1. What use may be made of nonquantitative methods in statistical 
research? 

2. Make a list of the main requirements of a well-chosen statistical 
problem, and give illustrations of what you consider good and poor, 
with your reasons. 

3. With which of the chief sources of secondary statistical data in 
the United States are you acquainted? 

4. Select from the latest United States Census a few definitions that 
seem to you (a) satisfactory, (6) unsatisfactory, and explain why you 
think so. 

6. What are some of the most unreliable counts in the United States 
Census of Population, and why? 

6. Collect instances of studies in which a questionnaire was mailed 
out and report on the proportion and representativeness of the returns 
received. 

7. a. In the statistical laboratory, propose problems on a competitive 
basis; and after a problem has been chosen, help design a study which 
your class in social statistics will carry out as a semester^s project. 

6. Does the problem satisfy the requirements that you listed under 
question 2 above? 

c. Indicate by which of the methods described in Chap. II the most 
important traits or factors concerned in this study will be measured, 
and show that no more exact measurement is feasible. 

d. What is the dependent variable? 

e. What are the main independent variables? 

/. What are the important interfering factors? 

g. How will the interfering factors be controlled? 

h. Is the sample adequate in size? 

1 . What is your assurance that it is representative? 

j. Does the schedule meet the demands mentioned in this chapter? 
Review the points. 

k. Do you have all the tables that will be needed for computation 
and exhibition purposes, and for interpreting the data? 

l. Do your instructions leave any important terms undefined, or 
any procedures unexplained? 

m. By what methods do you propose to test the reliability and, if 
necessary, the validity, of your schedule? 



THE STATISTICAL INQUIRY 


55 


n. To what extent have you used the method of cooperative definition 

to improve the validity of your indexes? 

0 . Will you try to measure the error due to the personal equation of 

the interviewers? 

p. How will you estimate the amount of error in your final results? 

References 

American Marketing Society, The Technique of Marketing Research^ McGraw- 
Hill Book Company, Inc., New York, 1937. 

Brown, Lyndon O.: Marketing Analysisy The Ronald Press Company, New 
York, 1937. 

Chapin, F. Stuart: Field Work and Social Researchy D. Appleton-Century 
Company, Inc., New York, 1920. 

PALMER, M. C.: Social Researchy Prentice-Hall, Inc., New York, 1939. 

PjLMER, M. C.: Technique of Social SurveySy Jesse Ray Miller, Los Angeles, 
1927. 

Fry, C. Luther: The Technique of Social Investigationy Harper & Brothers, 
New York, 1934. 

Lundberg, George A., Social Researchy Longmans, Green & Company, 
New York, 1929. 

Odum, Howard W., and Katherine Jocher: An Introduction to Social 
Researchy Henry Holt and Company, Inc., New York, 1929. 

Paemfor, Vivien M., Field Studies in Sociologyy University of (Chicago Press, 
Chicago, 1928. 

Young, Pauline V., Scientific Social Surveys and Researchy Prentice-Hall, 
Inc., New York, 1939. 




PART II 

Statistical Methods 




CHAPTER V 

TABULATION OF FREQUENCY DISTRIBUTIONS 


1. A Problem.—Before large groups of figures of any kind 
can be studied and interpreted, they must be arranged, or 
tabulated, in some orderly and meaningful way. 

As a first exercise in the tabulation of statistical data, let 
us investigate the sizes of sibling families from which the students 
at a given college come. A definition is needed. What is meant 
by ‘‘sizes of sibling families’’? Let us say that we mean the 
number of brothers and sisters, including the student. The 
sibling family, then, is the thing to be measured, while a sibling 
is the unit of count or measurement. Are siblings deceased to be 
counted? What of siblings married and moved away? What 
of adopted siblings, or other children not brothers or sisters 
reared in the family? Always such questions of definition 
of the thing to be measured and of the unit of measurement 
arise in the beginning of a careful inquiry, statistical or otherwise, 
and must be settled with the purpose of the investigator in view. 
In the present case, let us say that deceased siblings, siblings 
away from home, and children adopted or reared as siblings 
in the family shall be included. 

Assuming that we have defined the thing to be counted or 
measured, a sibling family, and the unit of count or measurement, a 
sibling, and that the units are equal and equivalent for our pur¬ 
pose, we then ask each student to tell the size of his sibling family. 
Let us imagine that 200 students give the following sizes of sibling 
families. 

2241 31 12 67 2. 

3 15234 1 22 5* 

261412336 3: 

3231 122234 

1 122223822 

5823 1253 15 6 

2311 39 1 822 

3 14334 1 222 

822122373 6 

59 



60 


ELEMENTARY SOCIAL STATISTICS 


4 2 5 2 

1 7 6 9 

13 11 

2 6 3 5 

2 5 3 3 

2 4 5 1 

3 2 14 

4 11 3 1 

1 3 2 1 

3 2 14 

4 4 14 


12 5 

1 1 1 

3 3 1 

15 2 

2 5 5 

14 1 7 

4 4 1 

1 2 3 

4 4 1 

6 2 5 

4 1 3 


3 2 8 

12 5 4 

5 2 2 

2 3 5 

2 3 2 

3 6 2 

2 2 6 

6 3 4 

6 2 2 

3 4 2 

4 4 10 


2. The Frequency Distribution: Discrete Variable. —We have 
here 200 values, varying from 1 to 15. So far, the answer to our 
wish to know the sizes of sibling families to which the students 
belong is rather confusing. The rangey or spread between the 
smallest and the largest values is the clearest bit of information 
we have.* It extends from 1 to 15, and is therefore 14. We 
should also like to know how many families of each size there are. 
As a preliminary step to this end, it is convenient to put the 
items in the form of an arrayy which means merely putting them 
in order of size. 

Table 5.—Array op Values 
1111222222333 3 44556 8 
1111222222333344556 8* 
1111222222333444566 9 
1111222222333444566 9 
1111222223333444567 10 
1111222223333444567U 
1111222223333445567 12 
1111222223333445567 12 
1111222223333445568 14 
1112222223333445568 15 


As a rule, however, a better form of the array is the frequency 
array. It is obtained from the original data by setting up a 
consecutive series of numbers covering all the observed values 
(here, the sizes of sibling families, 1, 2, 3, etc.) in the left-hand 
column of Table 6, and tallying in the right-hstnd column the 
number of times each consecutive value (size of family) occurs.^ 
The latter figures are termed frequencies, 

^ Tallying is commonly done by making in the proper row a sloping 
stroke for each item (e.^., family) until four strokes are made, then drawing 
a stroke through them for the fifth item; /)S(/ ffyU ///* The tallies 



TABULATION OF FREQUENCY DISTRIBUTIONS 


61 


Table 6.—Frequency Array of Values 
Size of Students Reporting 

Sibling Family Frequencies 


1 

2 

3 

4 
6 
6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

Total 


39 

65 

38 

24 

16 

12 

4 

4 

2 

1 

1 

2 

0 

1 

1 

200 


This gives the same information as Table 5, but in a much 
more compact form. Table 6 also satisfies our curiosity relative 
to the number of students reporting each size of sibling family. 
We see at once that most students are members of families of 
three or fewer siblings. . 

We shall next try lumping together into classesj or class 
intervals^ more than one size of family, with the double purpose 
of showing more smoothly how the students are grouped with 
respect to size of family, and of more easily calculating averages^ 
and other statistics from the table. Imagine combining into 
classes family sizes 1 and 2, 3 and 4, 6 and 6, 7 and 8, and so on. 
We then get Table 7. 

The work of combining the frequencies should be carefully 
checked by repetition, and it should be noted that the total is 
the same as for Table 6. 


are next counted, and a figure representing the total number of items 
in the row is entered in the frequency column of the table. The work of 
tallying should be repeated, as a check, and the total should agree with the 
number of original items (sibling families). When machine methods of 
tallying kre used, the sorting machine counts the frequency in each class, 
and the resulting totals are simply read off and entered in the table. 

^ But the average found from Table 7 will be less accurate than that found 
from Table 6. 





62 


ELEMENTARY SOCIAL STATISTICS 


Table 7.— Frequency Distribution of Data 
Size of Students Report- 

Sibling Family ing (Frequencies) 

1 and 2. 94 

3 and 4. 62 

6 and 6. 28 

7 and 8. 8 

9 and 10. 3 

11 and 12. 3 

13 and 14. 1 

15 and 16. 1 

Total..•. 200 


Table 7 is still more concise than Table 6 and the distribution 
of the frequencies is more regular. There are no classes of zero 
frequencies, but instead a rather steady decline in the number of 
cases as the size of family increases, which is what one would 
expect. 

Tables 6 and 7 are called simple frequency distributions, or 
merely frequency distributions^ because they show the frequency of 
occurrence of a set of values arranged in order of size. Table 6 
was also called a frequency array because successive class values 
increased by single units. 

In Table 7 the question arises, What is now the size of family 
in each class? In the first class, is the size of family the average 
of 1 and 2 = 1.5? This is more reasonable than to say that the 
size is either 1 or 2. But how can a family consist of one person 
and a half person? Is not this taking liberties with the data? 
The trouble is due to the circumstance that we are dealing with a 
discrete series, i.e,, a series that can take only certain values (whole 
numbers) and no intermediate values. Thus a sibling family 
may contain 1, 2, or 3 members, but not 1.3, 2.7, or 3.6 members, 
because people always come in wholes! In contrast to a discrete 
series is a contifiuous series, in which the variable^ may assume 
any whole or decimal value whatever. The ages in years of the 
students in a sample represent a continuous series: 19.3, 20.4, 
20.6, 21.2, 21.7, 21.9, 22.1, 22.5. While a continuous series can 
always be mathematically averaged without logical offense, 
this is not true of a discrete series. For example, the ages in 
years of five students are 19.3, 20.4, 21.7, 21.9, 22.1, and their 


' A quality (e.g., sibling family) that varies in size or amount. 












TABULATION OF FREQUENCY DISTRIBUTIONS 


63 


average (arithmetic mean^) age is 21.08, which is a possible value. 
But if five sibling families are of sizes 1, 2, 2, 3, and 5, respec¬ 
tively, the mean is 2.6, which is a fictitious value. We are thus 
faced with the dilemma either of disregarding the logical nature 
of a discrete series, or of abandoning the attempt to analyze it 
in terms of averages and other mathematical concepts. Since 
the purpose of an average is to simplify and represent a series, a 
fractional value may serve this end in the case of a discrete series, 
even though it is not strictly realistic, and many valuable facts 
can be discovered in this way that otherwise would not appear. 
For these reasons, discrete series are usually thrown into fre¬ 
quency distributions and treated in some ways as if they were 
continuous. 

Returning now to Table 7, we may regard the average value 
of the two sizes of families grouped together in each class as the 
mid-yoint of the class (e.^., for the first class of Table 7, the 
1+2 

mid-point is —^—• = 1.5). When any item is placed in a class 

with other terms, it is understood that it thereupon exchanges its 
original value for that of the mid-point of the class. For exam¬ 
ple, when a family of 4 siblings is placed in the class 3 and 4 in 
Table 7, the 4 is thereafter treated as if it were 3.5. The mid¬ 
points of any class should, therefore, always be as close as 
possible to the true average of the items included in the class. 
From Table 6 we see that the true weighted^ mean size of the fam- 

^ o .T V . (39 X 1) + (55 X 2) 149 , 

ilies of 1 and 2 siblings is == 1.585, 

whereas our mid-point is 1.5. This is rather close agreement, 
and may be satisfactory for our purposes. The mid-points of 
other classes may be similarly tested. From Table 6 it can be 
seen that the mid-point of the first class in Table 7 is too small, 
because there are more families of 2 than of 1; but this is some¬ 
what offset by too large a mid-point in the next class; and so on. 
Where one error balances another in this way, the accuracy of the 
mean found from the table is improved, although the mid-points 
of some of the classes may not be too good. Recasting Table 7 
in mid-point form, we have 

^ (19.3 + 20.4 + 21.7 + 21.9 + 22.1)/5 « 21,08. See Chap. VII. 

* In the weighted mean, each value (e.g., size of family) is counted as 
often as it occurs. 



64 


ELEMENTARY SOCIAL STATISTICS 


Table 8.—Frequency Distribution of Data: Mid-point Form 


Size of sibling family, 
mid-point (X) 

Students 
reporting (/) 

Product 

m 

1.5 

94 

141.0 

3.5 

62 

217.0 

5.5 

28 

154.0 

7.5 

8 

60.0 

9.5 

3 

28.5 

11.5 

3 

34.5 

13.5 

1 

13.5 

15.5 

1 

15.5 

Total. 

200 

6G4.0 


If we calculate the arithmetic mean from Table 8, we may 
compare it with the true mean found from Table 6. To find 
the mean, we multiply each mid-point by its frequency, sum 
the products, and divide by 200. This gives for the data of 
Table 8 a mean of 3.32, and for Table 6 a true mean of 3.315, 
which in this case are nearly identical. We may, therefore, 
approve Table 8 as far as this test is concerned. 

In the case of the data on sibling families, we need to show only 
the lowest and highest whole numbers that can fall within a 
class, because we are dealing with discrete or whole numbers. 
These upper and lower limits of a class are called class limits. 
We may set up the stub, or first column, of the frequency dis¬ 
tribution as shown in Table 7, or, if we prefer, we may write 1-2, 
3-4, 5-6, and so on. Frequency distributions are usually given 
in class limit rather than in mid-point form, but the latter is also 
common. The former is better suited for tallying, the latter for 
computing purposes. 

3. Selection of a Class Interval. —The suggestions usually 
given to aid in choosing a class interval for untabulated data are 

1. Note the range of the data, fe., the difference between the 
largest and smallest values of the variable. 

2. Decide about how large the interval has to be to make a 
significant difference in the data. For example, a difference of 
less than five points in a distribution of students’ grades would 
seem to be of no consequence, since most teachers make no 
attempt to grade closer than that. Indeed, 10 points may 
seem to some suflBciently close^ 




TABULATION OF FREQUENCY DISTRIBUTIONS 


65 


If the values already have some natural spacing, the latter 
should often be taken as the interval. For example, the size of 
farms in certain regions tends to be a multiple of 40 acres: 40, 
80, 120, 160, etc. 

3. Consider how many class intervals would result if the size 
of interval tentatively chosen in (2) above were divided into 
the range found in (1). As a rule, from 10 to 20 intervals are 
desirable, although, of course, more or fewer are permissible. 
Revise the size of interval suggested in (2) somewhat, if it seems 
advisable. 

4. Make all intervals of equal size, if feasible, and avoid 
open end intervals when possible. 

5. Decide tentatively upon the mid-points and class limits of 
the intervals. Unless difficulty in classification is introduced, 
the mid-points should be whole numbers for convenience in 
computing, and if they can be multiples of 5’s or lO^s, so much 
the better. 

6. Tally the data in the class intervals chosen. Note whether 
the resulting distribution reveals a smooth trend in the fre¬ 
quencies from one end of the scale to the other, avoiding an 
irregular, broken effect. If too large an interval has been used, 
some points of interest relative to increase or decrease of fre¬ 
quencies will be concealed. If the interval is too small, the 
distribution will lack smoothness. It is often necessary to try 
tabulations by larger and smaller intervals to decide these 
points. 

7. The accuracy of the class interval chosen for computation 
purposes should be tested by calculating the arithmetic mean 
from the table and comparing it with the true mean found from 
the ungrouped data or from a large random sample of the 
ungrouped data. To obtain a class interval that will give 
maximum accuracy it is often helpful to use a sliding scale 
device like that illustrated below (Fig. 6, applied to Fig. 5). 

The application of these suggestions will be illustrated. 

Below is a list of the final grades of a class in statistics; 

80 81 87 83 94 
94 85 78 82 85 
85 81 87 87 65 
88 81 80 75 70 
73 63 77 80 88 



66 


ELEMENTARY SOCIAL STATISTICS 


78 73 68 79 
68 83 70 83 
72 84 74 88 
76 90 85 88 

Ordinarily it is not worth while to set up a frequency distribu¬ 
tion for 41 cases, but a small number is used here for convenience. 

The first step is to arrange these values in order of size, to 
form a frequency array. 


Table 9.—Frequency Array op Values 


Grades 

(X) 

Frequency 

(/) 

Grades 

(X) 

Frequency 

(/) 

63 

1 

79 

1 

65 

1 

80 

3 

68 

2 

81 

3 

70 

2 

82 

1 

72 

1 

83 

3 

73 

2 

84 

1 

74 

1 

85 

4 

75 

1 

87 

3 

76 

1 

88 

4 

77 

1 

90 

1 

78 

2 

94 

2 

Total. 



41 


The range is 94 — 63 = 31. Intervals of less than five do not 
seem justified by the accuracy of the data. A natural grouping, 
or tendency for the grades to cluster about multiples of five, 
would be expected. There would be only three or four intervals 
of 10, which seem too few. A trial interval of five, with mid¬ 
points at 65, 70, 75, etc., is shown in Table 10. These mid¬ 
points are especially appropriate, because the clustering of 
the grades around them should increase the accuracy of the 
table for computing averages and other statistics, as illustrated 
above. Narrow class intervals are also generally more accurate 
for computation purposes than are wide ones. Like the data 
on sibling families, these percentage grades are given only in. 
whole numbers, and so may most conveniently be regarded as 
discrete. If 65 is taken as the mid-point of an interval of 5, 
evidently the lowest grade that belongs in this interval is 63 
and the highest is 67, so that the five grades 63, 64, 65, 66, and 












TABULATION OF FREQUENCY DISTRIBUTIONS 67 

67 are included. If the width of the class interval were an 
even number, such as 4, instead of an odd number, such as 5, 

X X 

XX X X XX 

XX X XXXXXXX X 

X X X X XXXXXXXXXXXXXX XX X X 

pn I I I I I I I I [“T I I I I rnrn i | i n—i—pT—i—n—pi—n 

63 68 73 76 83 88 93 

Fig. 6.—Data of array on page 66 plotted on unit scale. 

|”T” I I I I I I I I I I 'I" I I I I I I I I I I I I I 
Fig. 6.—Sliding scale, with trial interval of 4. 

the mid-points would be forced to take a decimal value, as was 
the case in Table 7. 


Table 10. —Frequency Distribution of Students* Grades 


Grades 

(X) 

L* 

Frequency 

(/) 

/X 

65 

63-67 

2 

130 

70 

68-72 

5 

350 

75 

73-77 

6 

450 

80 

78-82 

10 

800 

85 

83-87 

11 

935 

90 

88-92 

5 

450 

95 

93-97 

2 

190 

Total. 


41 

3,305 


♦ L means class limits. 


From inspection of Fig. 5, where each value has been plotted 
along the grade scale, it appears that the above choice of class 
intervals throws the mean below the mid-point in the intervals 
63-67, 68-72, 73-77, 83-87, 88-92, 93-97. Only in two intervals, 
however, 88-92, and 93-97, is the lack of balance serious. In 
one interval, 78-82, the mean is at the mid-point. The true 
mean of the series, computed from the separate values, is 80.195. 
The mean found from the frequency distribution of Table 10 is 
80.610, which is 0.415 too high, as would be expected. If this 
amount of inaccuracy is considered important for the purpose 
in handj an attempt to obtain a better class interval should be 
made. This may be facilitated by making a sliding scale from 
ordinary coordinate paper, using the same units as in Fig. 5 
(see Fig. 6). Class intervals of different sizes may be measured 







68 ELEMENTARY SOCIAL STATISTICS 

off on the sliding scale, and each tested in turn against the scale 
in Fig. 5. In the case of any given interval, the trial scale (Fig. 6) 
is moved along the fixed scale (Fig. 5) until the frequencies 
shown on the fixed scale are as evenly balanced as possible 
around the mid-points of the trial class interval on the sliding 
scale. If satisfactory, the values of the class limits may then 
be read off on the fixed scale from the intervals on the sliding 
scale when in this position of balance. Usually, some inaccuracy 
is inevitable in the use of class intervals. The problem is to 
keep it within such limits that no serious damage will result to 
the conclusions of the study. 

Tablb 11.— Bibth Rates per 1,000 in 150 Approximately Equal 

Populations 


Claas limits 

(1) 

Mid¬ 

points 

(2) 

Frequencies 

(3) 

12.5-13.4 

13 

3 

13.5-14.4 

14 

15 

14,5-15.4 

15 

26 

15.5-16.4 

16 

31 

16.5-17.4 

17 

43 

17.5-18.4 

18 

25 

18.5-19.4 

19 

5 

19,5-20.4 

20 

2 

Total. 

•• 

^ 150 


4. The Frequency Distribution: Continuous Variable.—A con¬ 
tinuous variable, such as birth rates, is tabulated in the same 
way as a discrete variable, except that a slight modification is 
needed in finding class limits from mid-points, and vice versa. 
In Table 11, given the mid-points of col. (2), what are the lower 
and upper values of each class within which the birth rates can 
be classified? The boundary line between any two mid-points 
should evidently be halfway between them—in this case i of 1, 
or 0.5 unit above the lower or below the higher mid-point. We 
thus get as our class limits in col. (1), 12.5, 13.5, 14.5, and so on. 
Notice that the upper limit of any class is made slightly smaller 
than the lower limit of the class just above, to indicate that a 
case falling exactly on the border line between two classes is 










TABULATION OF FREQUENCY DISTRIBUTIONS 69 


placed in the upper class rather than in the lower. ^ Assuming 
that our original data carry only one decimal place, it is enough 
to write the upper limit of the class 12.5 to 13.6, for example, as 
13.4; but if the data carried two decimal places, the upper limit 
should be written 13.49; and so on. 

To find the mid-points, given the class limits, of the continuous 
variable of Table 11, we add the lower limit of an interval to 
the lower limit of the interval next higher on the scale, then average 
them. 

12.5 + 13.5 
2 

13.5 + 14.5 
2 

and so on. 

6. The Frequency Distribution: Nonquantitative Variable.*— 

Let us imagine that, instead of adopting quantitative classes for 
sizes of sibling families, as shown in Table 7, we had asked 
the students to state whether or not the size of their sibling 
family was large, medium, or small, without telling them what 
sizes of families should be placed in each of the three categories. 
We might then get a table something like Table 12. 


13 

14 


Table 12. —Size op Sibling Families 

Size of 

Sibling Family 

Small. 

Medium. 

Large. 

Total. 


Students 
Reporting 
.. 116 
.. 46 

.. _38 
.. 200 


We may now call attention to three requirements of classifica¬ 
tion that were not mentioned in our previous work, although 
they were tacitly assumed. The first of these is that the cate¬ 
gories must be mutually exclusive. The second is that they must 
be exhaustive. The third is that there must be only one basis of 
classification at a time. 


^ Theoretically, the frequency of a value that is identical with a class limit 
should perhaps be divided equally between the two classes above and below; 
but in most practical work the method suggested above is more convenient 
and sufficiently accurate. 

* A nonquantitative variable is a quality that varies in amount, but is not 
measured in terms of units. 







70 


ELEMENTARY SOCIAL STATISTICS 


With respect to the last-named requirement, the basis of 
classification in Table 12 is size of sibling family. There is no 
evidence that any other principle was used in this table. A 
question may be raised, however, about the first requirement. 
If we checked to see, we should certainly find that some sibling 
families of three were listed as small and some as medium, and 
that similar errors were made in the case of families of other 
sizes. Moreover, if the sibling families reported above were 
entered in Table 12 by two independent investigators, even 
though neither happened to put families of the same size in two 
different classes, there is little chance that they would both 
classify each size of family in the same class. Some would regard 
a family of three as medium, others would regard it as small. 
Their finished tables would not show the same frequencies in each 
class. Because these difliculties of classification multiply with 
the number of classes, it is usually advisable to have very few 
classes in a qualitative table, e.g,, three in Table 12. This 
limits the analysis to broad categories. 

Regarding the principle of exhaustiveness of classification, we 
need to ask: Were there any families that could not be classified in 
one of the three classes of Table 12? Apparently there were not, 
so the table passes this test. 

Can we calculate the mean of Table 12, as we did in Table 8? 
At once the question arises, what are the values of the mid-points 
in Table 12? Since the classes in this table are not quantitative, 
no quantitative values can be assigned to their mid-points. We 
therefore discover that we are unable to analyze a nonquantita- 
tive table by the use of the mean. All that we can do is to say 
that the modal class, or the class containing the largest frequency, 
is that of small families. 

From this illustration, we learn that nonquantitative tables 
not only are likely to violate the logical principle of classification, 
which requires that the several categories be mutually exclusive, 
but that they also do not lend themselves to the calculation of 
the mean and other basis statistical measures by which quantita¬ 
tive tables are customarily analyzed. For such reasons as 
these, quantitative classes are always to be preferred to qualita¬ 
tive for purposes of statistical analysis. The latter should be 
employed only where quantitative classes are not obtainable. 



TABULATION OF FREQUENCY DISTRIBUTIONS 71 


6. The Frequency Distribution: Table Structure.^ —The main 
heading of a frequency table is called the title; the left-hand 
column with its heading, the stub; and the heading of the right- 
hand column, the caption. These are illustrated in Table 13. 


(Title) Table 13.— The Size op Sibling Families of 200 Students op 
Sociology, Blank College, 1939-1940 
(Stub) Siblings in Family (Caption) Students Reporting 


1- 2 94 

3- 4 62 

5-6 28 

7-8 8 

9-10 3 

11-12 3 

13-14 1 

15-16 1 

Total. 200 


As far as feasible, a table should be self-explanatory, but 
no unnecessary word or figure should be included. The title 
should usually mention the variable in the stub first; the units 
of the caption and their number, second; and any further sub¬ 
divisions of the stub or caption. It should also generally men¬ 
tion the date and place. The purpose of the stub and the caption 


Table 14.— The Size of Sibling Families of 200 Students, Blank 
College, 1940-1941, by Ukban and Rural Residence 


Siblings in family 

Students reporting 

Total 

Urban 

Rural 

1- 2 

94 

28 

66 

3- 4 

62 

20 

42 

5- 6 

28 

11 

17 

7- 8 

8 

5 

3 

9-10 

3 

2 

1 

11-12 

3 

0 

3 

13-14 

1 

0 

1 

15-16 

1 

0 

1 

Total. 

200 

66 

134 


^ MoreMetailed discussion of this topic and of tables that do not represent 
frequency distributions will be found in the fifth and seventh references at 
the end of this chapter. 







72 


ELEMENTARY SOCIAL STATISTICS 


is simply to indicate the nature of the entries in the columns. 
The ‘‘Total” row is often placed at the top instead of at the 
bottom of the table. One customary type of ruling is shown in 
Table 14. If it is desirable to block off one part of a table from 
another, this may be done by means of a heavy or double ruling. 

The chief requirement of a good table is that it be simple and 
clear. For this reason, it is generally unwise to subdivide the 
stub or the caption very often. One simple subdivision of the 
caption is shown in Table 14. 

In the case of every subclassification of the data in a table, 
the principles of classification already mentioned apply. 

Exercises 

1. Tabulate each of the two following series in a frequency dis¬ 
tribution, showing class limits, and test the accuracy of each of the 
tables. Note: The population numbers are so large that only whole 
hundreds or thousands should be used as class limits and mid-points; 
but in finding mid-points from the class limits the method suggested for 
a continuous variable should be used. An interval at least as small as 
6,000 seems to be needed to differentiate between the bulk of the county 
populations under 40,000. But above that point increasingly large 
intervals are appropriate. The last interval may be taken as “300,000 
and over,” with the actual population of the single largest county, 
318,587, given in a footnote. A table may be “broken” to avoid many 
intervals without frequencies. 



TABULATION OF FREQUENCY DISTRIBUTIONS 


73 


Georgia Counties, 1930* 


County 

Population 

Population per 
square mile 

County 

Population 

Population per 
square mile 

1 

13.314 

29.3 

46 

21,699 

50.1 

2 

6,894 

20.9 

47 

18,025 

45.4 

3 

7,055 

26.0 

48 

22,306 

65.2 

4 

7,818 

21.9 

49 

9,461 

45.5 

6 

22,878 

74.5 

50 

18,273 

34.9 

6 

8,703 

43.7 

61 

2,744 

7.6 

7 

12,401 

73.8 

52 

10,164 

22.7 

8 

25,364 

53.9 

53 

18,485 

51.2 

9 

13,047 

51.0 

54 


31.5 

10 

14,646 

32.2 

55 

7,102 

24.7 

11 

77,042 

278.1 

56 

12,969 

32.3 

12 

9,133 

44.6 

57 

8,665 

37.0 

13 

6,895 

15.9 

58 

48,667 

96.9 

14 

21,330 

41.5 

69 

10,624 

43.0 

15 

5,952 

13.8 

60 

15,902 

57.0 

16 

26,509 

39.7 

61 

318,587 

1,650.7 

17 

29,224 

30.6 

62 

7,344 

16.7 

18 

9,345 

46.0 

63 

4,388 

25.8 

19 

10,676 

37.2 

64 

19,400 

44.2 

20 

6,338 

8.9 

65 

16,846 

44.9 

21 

9,903 

46.9 

66 

19,200 

43.2 

22 

8,991 

39.4 

67 

12,616 

30.3 

23 

34,272 

69.7 

68 

27,853 

63.3 

24 

9,421 

55.7 

69 

12,748 

44.0 

25 

4,381 

5.5 

70 

30,313 

69.4 

26 

105,431 

284.9 

71 

13,070 

24.7 

27 

8,894 

40.8 

72 

13,263 

46.7 

28 

15,407 

47.0 

73 

11,140 

22.2 

29 

20,003 

46.6 

74 

15,174 

58.1 


25,613 

224.7 

75 

9,102 

31.9 

31 

6,043 

34.2 

76 

15,924 

49.1 

32 

10,260 

72.3 

77 

11,280 

25.5 

33 

7,015 

9.4 

78 

12,199 

32.3 

34 

35,408 

100.3 

79 

21,609 

60.9 

35 

19,739 

31.2 

80 

8,594 

26.8 

36 

30,622 

67.9 

81 

8,118 

27.1 

37 

8,793 

25.1 

82 

20,727 

32.1 

38 

11,311 

46.9 

83 

12,908 

37.7 

39 

25,127 

56.7 

84 

12,681 

43.4 

40 

7,020 

22.0 

85 

8,992 

23.9 

41 

17,343 

62.6 

86 

9,754 

53.0 

42 

^ 4,146 

22.3 

87 

6,190 

27.2 

43 

3,502 

16.2 

88 

32,693 

40.6 

44 

23,622 

40.5 

89 

8,328 

25.5 

45 

70,278 

258.4 

90 

8,153 

15.0 


*From the Fifteenth Cenaus of the United State*, 1930, Bureau of the Census, Washington, 
D. C. 


















74 


ELEMENTARY SOCIAL STATISTICS 


Georgia Counties, 1930.* — (Continued) 


County 

Population 

Population per 
square mile 

County 

Population 

Population per 
square mile 

01 

7,847 

27.0 

126 

20,503 

25.8 

92 


10.6 

127 

7,389 

30.8 

93 

29,994 

62.1 

128 

23,495 

112.4 

94 

4,927 

17.6 

129 

11,740 

70.7 

95 

9,014 

31.4 

130 

11,114 

27.0 

96 

5,763 

12.3 

131 

26,800 

58.8 

97 

16,643 

50.1 

132 

8,468 

27.1 

98 

14,921 

52.5 


6,172 

29.1 

99 


19.4 

134 

15,411 

33.1 

100 

22,437 

45.2 


10,617 

31.2 

101 

9,076 

35.9 

136 

14,997 

40.2 

102 

6,730 

49.1 

137 

18,290 

51.8 

103 

23,620 

43.1 

138 

32,612 

61.5 

104 

11,606 

24.7 

139 

16,068 

66.1 

105 

10,020 

52.7 

140 

17,165 

43.7 

106 

12,488 

32.0 

141 

4,346 

24.0 

107 

9,215 

26.9 

142 

7,488 

28.6 

108 

57,558 


143 

36,752 

84.5 

109 


66.0 

144 


48.5 


8,082 

47.0 

145 


26.7 

111 

12,927 

25.6 

146 


19.6 

112 

12,327 

38.0 

147 


61.5 

113 

10,268 

57.4 

148 


60.7 

114 

9,687 

41.9 

149 

21,118 

63.8 

115 

12,522 

36.3 

150 

26,558 

34.1 

116 

10,853 

45. S 

151 

11,181 

27.7 

117 

25,141 

79.3 

152 

25,030 

37.4 

118 

9,005 

34.9 

153 

12,647 

; 20.6 

119 

8,367 

23.2 

154 

5,032 

16.7 

120 

3,820 

26.5 

155 

9,149 

34.7 

121 

6,331 

16.8 

156 

6,056 

24.7 

122 

17,174 

41.7 

157 

20,808 

73.5 

123 

72,990 

228.8 

158 

13,439 

33.3 

124 

7,247 

60.9 

159 

15,944 

34.8 

125 

5,347 

34.7 


10,844 

23.0 




161 

21,094 

32.4 


* From the Fifteenth Census of the United States, 1930, Bureau of the Census, Washington, 
D. C. 


2. Subdivide the table of county populations prepared in Exercise 1 
above according to population per square mile, choosing your own points 
of division in the latter factor. 

3. Open a textbook in elementary sociology to some page at random, 
and classify each word on the page as “Very short,*’ “Short,” “Aver- 































TABULATION OF FREQUENCY DISTRIBUTIONS 


75 


age,” “Long,” “Very Long.” Show the results in tabular form. Do 
the same thing for an elementary textbook in economics, and compare 
the length of words in the two tables. 

4. It is wanted to know the occupation of the fathers of students 
majoring in sociology. The students are asked to check the form 
below: 

Laborer 

Businessman 

Professional 

Farmer 

Is this satisfactory? 

Where would a carpenter-contractor be placed? A policeman? The 
proprietor of a radio repair shop? 

6. A study is to be made of farm wages in your state. How would 
you define the unit of study? 

6. Explain and illustrate the meaning of these terms: (a) array, 
(6) range, (c) frequency distribution, (d) class interval, (e) mid-point, 
(/) class limits, (g) grouped data. 

7. What is the effect of tabulation by class intervals on the accuracy 
of statistics calculated from a table? Why is this? 

References 

Chaddock, R. E,: Principles and Methods of Statisticsy Chaps. IV and V, 
Houghton Mifflin Company, Boston, 1925. 

Fifteenth Census of the United StateSy 1930: Populationy Vol. II. 

Garrett, H. E. : Statistics in Psychology and Educatiouy pp. 1-8, Longmans, 
Green & Company, New York, 1926. 

Mills, F. C.: Statistical Methodsy rev. ed., Chap. Ill, Henry Holt and Com¬ 
pany, Inc., New York, 1938. 

Mudgett, B. D.: Statistical Tables and Graphs, Part I, Chap. Ill, Houghton 
Mifflin Company, Boston, 1930. 

Sorenson, H.: Statistics for Students of Psychology and Education, pp. 16-27, 
McGraw-Hill Book Company, Inc., New’ York, 1936. 

Walker, H. M., and W. N. Durost: Statistical Tables, Their Structure and 
Use, Parts I and ITT, Bureau of Publications, Teachers College, Colum¬ 
bia University, New York, 1936. 

White, R. C.: Social Statistics, Chap. VI, Harper & Brothers, New York, 
1933. 

Yule, G. U., and Kendall, M. G.: An Introduction to the Theory of Statistics, 
Chap. VI, Charles Griffin & Company, Ltd., London, 1937. 



CHAPTER VI 
GRAPHS 

1. Graphs of Frequency Distributions.—It is often helpful in 
interpreting a frequency distribution or other statistical data 
to show the facts in graphic form. One method of picturing a 
simple frequency distribution is by means of the histogram. 
Table 15 may be represented as shown in Fig. 7. 


Tabus 15.—Grades Made by 41 Students op Statistics, Blank 
College, 1939-1940 


Grades, per cent 

Students 

Accumulated 

frequency 

Accumulated 

percentage 

frequency* 

63-67 

2 

2 

4.9 

68-72 

5 

7 

17.1 

73-77 

6 

13 

31.7 

78-82 

10 

23 

66.1 

83-87 

11 

34 

82.9 

88-92 

5 

39 

95.1 

93-97 

2 

41 

100.0 

Total. 

41 




* Each accumulated frequency is expressed as a percentage of 41, A ■" 17. 


In connection with the histogram, it should be noticed that, 
if the class intervals are taken as one unit each, the area of the 
figure is equal to the total frequency of the table. In Fig. 7, 
for example, 

area = 2Xl + 5Xl + 6Xl + 10Xl + llXl + 

6 X 1 + 2 X 1 == 41. 

A second device for picturing a simple frequency distribution is 
the frequency polygon, which is constructed by connecting the 
mid-points of the class intervals of the histogram by straight 
lines. It is shown in Fig. 8. If it is extended to the base line 
at the mid-points of the intervals next beyond the end intervals, 

76 












GRAPHS 


77 


and all equal intervals are taken as one unit in width, its total 
area is equal to that of the total frequency of the table, but the 
area over any one interval is usually not equal to the frequency 
in that interval. 



Fig. 7.— Histogram of Table 16.^ 

A histogram of the simple frequency distribution of Table 16, 
which has unequal intervals, appears in Fig. 9. 

Table 16. —Age Distribution for Mexicans in the United 
States, 1930* 


Age, Years Number (Thousands) 

Under 5. 21.48 

&-9. 20.55 

10-14. 14.81 

15-19. 13.72 

20-24. 14.65 

25-29. 13.53 

30-34. 10.11 

35-44. 16.30 

45-54. 9.53 

55-64 . 4.60 

65-74. 1.96 

75 and over. 0.88 

Total. 142.12 


* Adapted from Abstract of the Fifteenth Census of the United States, 1930, Bureau of the 
Census. 

If we let each interval of five years on the base line be one unit, 
then, of course, an interval of 10 years will be two units, and the 
height of the rectangle in a 10-year interval will be one-half of 
the tabular frequency in that interval. The end interval, “75 

* Notice that graphs, such as those of frequency distributions, which 
involve two sets of measurements, are erected on the framework of two 
graduated straight lines drawn at right angles. The horizontal line is 
called the X axis, the perpendicular line the Y axis. Frequencies are con¬ 
ventionally measured on the Y axis (but see Fig. 13), scale values on the X 
axis (see Fig. 7). 
















78 


ELEMENTARY SOCIAL STATISTICS 


and over/^ in Table 16 is of unspecified length, and so cannot 
be accurately represented geometrically. It is accordingly 
omitted from the graph, and its frequency removed from the 
total. The sum of the areas of the remaining rectangles is then 
equal to the corrected total frequency of the table. Moreover, 


Y 



G r a d e s 

Fig. 8.—Frequency polygon of Table 15. 

the area of each rectangle is equal to the frequency in the cor¬ 
responding interval. 

If a polygon is drawn on Fig. 9 in the usual way, neither the 
total area of the polygon nor the area in any interval will be 
equal to the corresponding tabular frequency. The total area 
can be made equal to the total frequency, however, if the polygon 
is drawn to the mid-points of five-year intervals throughout, 
using the same frequencies (heights) as in Fig. 9. 



Notice that if the frequencies were known and graphed for 
each year of age, instead of for each five- or 10-year age interval, 
the rectangles of the histogram in Fig. 9 would become more 
numerous and nairow. If then the frequencies in each year 
were separated by months, we should have still more and nar¬ 
rower rectangles. If this process of subdivision of intervals 








GRAPHS 


79 

were continued indefinitely, we should have a smooth curve 
instead of a histogram or a polygon in Fig. 9. It is apparent that 
if each minute interval were then regarded as being one unit in 
width, the area under any part of the smooth curve would be 
equal to the frequency over the same portion of the table (see 
Fig. 10). A great deal of use is made of this fact in some of the 
chapters that follow. 

A polygon may be smoothed by passing through it a freehand 
curve. This is a somewhat questionable way of judging how the 
distribution would appear if the size of the sample were greatly 
increased. 



Fia. 10.—Histogram of Fig. 9 reduced to a smooth curve. 

When histograms or polygons are to be compared, they 
should be graphed in terms of percentage rather than absolute 
frequencies. 

A very useful type of graph in the interpretation of a fre¬ 
quency distribution is the cumulative curve, or ogive. The 
accumulated frequencies for Table 15, forming a cumulative 
frequency distributiony may be seen in the last column of that 
table. Since each accumulated frequency merely shows the 
total number of values that are less than the lower limit of the 
class just above on the scale, a frequency distribution in accumu¬ 
lative form is sometimes called a less than frequency distribution. 
Plotting sjiould be done carefully on coordinate paper, in order 
that the resulting graph may be accurate enough for computing 
purposes. The cumulative frequency curve for Table 15 is 
shown in Fig. 11. 







80 


ELEMENTARY SOCIAL STATISTICS 


Notice in Fig. 11 that the frequency in each class interval is 
plotted on the upper class limits to show that a particular number 
of students made a grade less than the one indicated by that 
limit. Thus, 34 students in the given course made grades less 
than 88, and 39 made grades below 93. 

Not only does the cumulative frequency curve give a picture 
of the distribution of frequencies that is different from that 
shown by the histogram or polygon, but it may also be employed 
for interpolation and computation. If, for example, we are 
given Table 15 but know nothing else about the data, and wish 
to change the class limits of the table, we can sometimes do this 

Y 



most conveniently by means of the cumulative curve. Suppose 
that we want the class limits of 65 to 69, 70 to 74, 75 to 79, 
and so on. How many students made grades falling in each of 
these new intervals? This can be decided approximately by 
erecting perpendiculars at the points 65, 70, 75, and so on, on 
the base scale, noting where they intersect the cumulative curve, 
and drawing horizontal lines from these points of intersection 
to the frequency scale at the left. Thus, the horizontals cut 
the frequency scale at approximately the values 1, 5, 10, 18, 29, 
37, 40. We can accordingly set up a new frequency table. 
Table 17, whose last column is obtained by subtracting in the 
second column the accumulative frequency in each class from 
that in the class just above. 

When it is desired simply to halve or combine the class intervals 
of a simple frequency distribution, the work may be done by 
direct division or addition more easily than by the use of a 



GRAPHS 


81 


Table 17.— Illustrating Change of Class Intervals of Table 16, bt 
Use of Cumulative Frequency Curve 


Grades, per cent 

Accumulative frequency 

Students 

60-64 

1 

1 

65-69 

5 

4 - 

70-74 

10 

5 

75-79 

18 

8 

80-84 

29 

11 

85-89 

37 

8 

90-94 

40 

3 

95-99 1 

41 

1 

Total. 

’ 

41 


cumulative curve. Thus, if in Table 15 the intervals are to be 
halved, the frequencies in each interval are also halved cor¬ 
respondingly. It is often desirable, however, to modify this 


Y 



Fio. 12.—Ogive in terms of percentage frequencies. 

method somewhat by allowing for the shape of the curve. For 
example, if the curve is rising in the interval, more of the fre¬ 
quencies may be placed in the upper than in the lower subdivision 
of the interval. 

Percentage frequencies are often substituted for absolute 
frequencies on the ogive. Figure 12 is the same as Fig. 11 except 
for this change. From it we read on the Y axis that 50 per cent 
of the students made a grade of less than 82 on the X axis, 
approximately; 75 per cent make less than a grade of about 87; 
and so on. The readings can be more accurate if finely ruled 
coordinate paper is used. 




82 


ELEMENTARY SOCIAL STATISTICS 


Values may be accumulated on both scales, X and V, and 
expressed as percentages of their respective totals. This has 
been done in Table 18 and Fig. 13. Each pair of accumulative 
percentages determines a point, and they are called the 
coordinates of the point. For example, the first two accumula¬ 
tive percentages in the table furnish the coordinates (6.6, 0.2), 
the one on the left (6.6) being an X value and the one on the 
right (0.2) a Y value. The point is located on the chart by going 
a distance of 6.6 percentage units from 0 along the X axis, and 
then perpendicularly up a distance of 0.27 percentage units. 

Table 18.— Number or Farms by Size, Kansas, 1930* 


Size of 
farm, acres 


Under 20 


Number 
of farms 


11,004 

9,264 

19,226 


86,739 

312,710 

1,475,364 

6,319,557 

5,565,698 

13,796,240 

10,243,252 

7,184,515 

1,991,572 


* Adapted from Fifteenth Census of the United States, Bureau of the Census. 


Farms 

Acres 

Farms 

Acres 



6. 

,6 

0. 

2 

6. 

,6 

0.2 

5. 

,6 

0. 

,7 

12. 

2 

0.9 

11. 

6 

3. 

1 

23. 

.8 

4.0 

25. 

8 

13. 

5 

49. 

6 

17.5 

15. 

4 

11. 

,8 

65. 

0 

29.3 

23. 

1 

29. 

.4 

88. 

.1 

58.7 

9. 

1 

21, 

.8 

97. 

.2 

80.5 

2 

.7 

15 

.3 

99 

.9 

95.8 

0 

.1 

4 

2 

100 

.0 

100.0 

100. 

.0 

100, 

.0 





The resulting curve is called the Lorenz curve. From it we can 
see that 50 per cent of the farms, i.e.y the small farms (reading 
from the left on the X axis) include 18 per cent of the total farm 
acreage (reading from the bottom on the Y axis); that about 
100 — 65 = 35 per cent of the farms, z.e., the large farms (read¬ 
ing from the right on the X axis) include 100 — 30 = 70 per cent 
of the total farm acreage (reading from the top on the Y axis); 
and so on. 

Further uses of the cumulative curve for computation will be 
shown later, under the topic of partition values (Chap. VIII). 

2. Graphs of Time Series.—Statistical data often take the 
form of a time serieSy rather than of a frequency distribution. A 
time series is a set of values of a variable that correspond to 














GRAPHS 


83 


certain time intervals, such as years or months. For example, 
the populations of a state in 1920 and again in 1930 are a very 
brief time series (sec Fig. 14). 

In plotting the increase of one variable, e.g., the population 
of a state, in terms of a second variable, e.g.^ years, it is often 


in 


I! 


0 2 

Farms, per cent 

Fig. 13. —Lorenz curve, for Table 18. 



1920 1922 1924 1926 1928 1930 

Year 

Fio. 14.—Population growth in absolute amounts. 

of more interest to show the proportionate increase than the 
absolute increase. For example, if a population of 3.0 millions 
increases to 3.5 millions in 10 years, the increase is much less 
impressive than when a population of 0.2 million increases to 










84 


ELEMENTARY SOCIAL STATISTICS 


0.7 million in the same period. Yet if the absolute increase is 
plotted, this difference will not appear, as may be seen from Fig. 
14, where the two growth lines are exactly parallel. To meet 
this objection, the percentage increase may be plotted. The 
growth from 3.0 to 3.6 millions is a percentage increase of 17, 
that from 0.2 to 0.7 million is a percentage increase of 250. This 
is shown in Fig. 15, where the line representing the growth 
of the population of 0.2 million is much steeper than that repre¬ 
senting the growth of the population of 3.0 million. In making 
Fig. 15, the rate of growth in terms of the initial population is 
required. Instead of going to the trouble of computing these 
rates, much the same results may be accomplished by plotting 

Y 



Fig. 16.—Population growth in terms of percentage increase. 

the absolute figures on a semilogarithmic scale. The latter 
method is usually preferred to the former, because semilogarithmic 
paper can be obtained at small cost, and the use of it saves much 
labor. 

Figure 16 shows the above population figures plotted directly 
on semilogarithmic paper. 

In Fig. 16, notice that the increase in population from 0.2 to 
0.7 million is again represented by a much steeper line than is 
the increase from 3.0 to 3.5 millions. 

While the semilogarithmic scale does not show in strictly 
accurate proportion one to another all percentage changes, it 
represents equal percentage changes by equal slopes, and saves 
much labor compared with percentage charts such as that shown 
in Fig. 15. 

In using semilogarithmic paper, the repeated series of values 
1 to 9 usually printed on the vertical scale may be multiplied by 
any constant, provided the constant is applied to the whole scale. 



GRAPHS 


85 


Thus, in Fig. 16 the scale may be multiplied, say, by 7, by 0.5, 
or by any other number, when thereby it will be made more 
convenient for the plotting of particular data. A semilogarith- 
mic scale cannot contain a zero value. 



In all graphic representation of data, the shape of the curve 
or figure is affected by the ratio of the X and Y scales. Since 
this ratio is usually a matter of arbitrary choice, advantage is 
sometimes^ taken of the opportunity to produce certain desired 
impressions. Figures 17 through 19 from Table 19 illustrate 
only three of many possibilities. 






86 


ELEMENTARY SOCIAL STATISTICS 


Table 19.—Increase in Enrollment op the Blank Military Academy, 

1928-1938 


Year 

Enrollment j 

Year 

Enrollment 

1928 

no 

1934 

118 

1929 

113 

1935 

120 

1930 

114 

1936 

122 

1931 

113 

1937 

127 

1932 

118 

1938 

130 

1933 

119 




In Fig. 17, a rather moderate increase in enrollment is made 
impressive by (1) using a large single-unit spacing on the Y scale, 
(2) starting the increase from the base (bottom) line, and so 
avoiding any comparison between the amount of increase and the 
original volume of enrollment, (3) showing each ycar\s increase as 
a percentage of the enrollment in 1928, instead of as a percentage 


Y 



Fia. 17.—Graph of data of Table 19. 


of the enrollment of the preceding year. Figure 18 removes 
criticism (2) above, and avoids criticism (3) by using absolute 
enrollment figures instead of percentages of increase relative 
to the total enrollment in 1928. Figure 18 is still open to 
criticism (1) above, because the ratio of the X and Y units is 
not changed. In fact, the X and Y units are different in nature, 
so that it is impossible to say when one bears a just relation to 
the other. 

Figure 19 meets criticism (3) by plotting the enrollment 
figures on a semilogarithmic scale. The total enrollment is not 







GRAPHS 


87 


entirely pictured in the diagram because the semilogarithmic 
scale begins at 1 instead of at 0, but this is a minor matter. 
Evidently, the growth of the school makes a much poorer showing 
in Fig. 19 than in either Fig. 17 or Fig. 18. Probably Fig. 19 
gives the most realistic picture of the facts in this particular case. 



Year 

Fig. 18.—Absolute increase in enrollment of the blank military academy, 

1928-1938. 


3. Miscellaneous Graphs. —A common device for the graphic 
comparison of amounts or percentages is the bar chart, either 
upright or horizontal. The histogram of Fig. 7 above can be 
regarded as essentially an upright bar chart. Figure 20 shows a 
horizontal bar chart applied to Table 20. 


Table 20. —Percentage of Females 15-44 Years of Age Married, 
Selected European Countries* 


Country 

Bulgaria. 

England and Wales. 

France. 

Germany. 

Italy. 

Sweden. 

♦ Adapted from W. S. Thompson, Population Problems, 
Company, New York, 1935. 


Percentage of 
Females Married 

. 67.0 

. 48.5 

. 57.1 

. 48.4 

. 48.4 

. 42.3 

2d ed., p. 104, McGraw-Hill Book 









88 


ELEMENTARY SOCIAL STATISTICS 



Feo. 19.—^Rate of increase in enrollment of the blank military academy, 1928- 

1938. 


Bulgaria 

England and Wales 

France 

Germany 

Ifaly 

Sweden 



0 10 20 30 40 50 60 70 


Percentage of females married 


Fio. 20.—^Percentages of females 15-44 years of age married in selected European 
countries. {From W, S, fhompoon, op, cU,t p. 104.) 




GRAPHS 


89 


Two variations of the bar chart are seen in Figs. 21 and 22. 
Instead of the bar chart, comparisons are often made in terms 
of the areas of squares or circles, or of the volumes of cubes or 
spheres, as in Figs. 23 and 24. These devices, however, force the 


sd6% m.-m 



Fiq. 21.—Percentage of the population of the United States represented by 
each race, 1930. {Adapted from R. Clyde White, Social Statistica, p. 178, Harper 
& Brothers, New York, 1933.) 




1923 


74. 

-\ Npgrn?40% 


742% 


Fio. 22.—Percentage of white and Negro races among the commitments 
to prisons and reformatories, 1910 and 1923. {From R. Clyde White, op. dt., 
p. 179.) 


29 Vo 


Native born Foreign born 

Fig. 23.—Ratio of native 
born to foreign born in City 
X: 1930. 



Native born Foreign born 

Fig. 24.—Ratio of native 
born to foreign born in City 
X: 1930. 



Fio. 26.—World distribution of telephones. {Adapted from 0. R. Davies 
and Dale Yoder, Btteinees Statistics, p. 40, John Wiley db Sons, Inc., New York, 
1937.) 

eye to perform the rather difficult feat of measuring two or even 
three dimensions simultaneously. 

The so-called ‘‘pie chart,” pictured in Fig. 25, is convenient 
for showing how a whole is subdivided. 





90 


ELEMENTARY SOCIAL STATISTICS 



MARCH 1929 MARCH 1931 

Fig. 26. —Estimated unemployment. United States, March, 1929, and March, 
1931. (Adapted from On Reliefs Federal Emergency Relief Administration^ 
Chart IX.) 



Fig. 27.—Percentage of unemployment relief expenditure paid locally, state of 
Wisconsin, year ending Sept. 1, 1934. 





GRAPHS 91 

More realistic and striking than any of the preceding devices 
are pictograms, of which Fig. 26 is an example. 

Maps are treated in many ingenious ways for statistical 
purposes. Crosshatching (see Fig. 27), the insertion of picto¬ 
grams, and spotting are common devices. 

In any attempt to present statistical figures in graphic form, 
the following two principles are to be kept in mind. (1) The 
graph should be more quickly and easily comprehended than the 
same data in tabular or nongraphic form. Graphs are sometimes 
so complex or ingenious that they can be deciphered only with 
the aid of the textual and tabular material that they are intended 
to clarify. (2) The graph should not misrepresent or exaggerate 
the facts. 


Exercises 


1 . The following figures taken from the Fifteenth Census of the United 
States represent the growth in the population of Milwaukee: 


1930 

578,249 

1880 

115,587 

1920 

457,147 

1870 

71,440 

1910 

373,857 

1860 

45,246 

1900 

285,315 

1850 

20,061 

1890 

204,468 

1840 

1,712 


Show these data graphically. 

2. Suppose that you grade the behavior of a group of juvenile delin¬ 
quents in a reform school and want to post a weekly chart showing the 
standing of each delinquent. Describe briefly the kind of chart you 
would use. 

3. The charts on page 92 show how the numbers of the insane, epilep¬ 
tic, and feeble-minded persons in state institutions and the prison 
population in a certain state have increased in the last 25 years. 

Have you any criticism of these charts? 

4 . Plot the distribution shown below as a frequency polygon. Show 
that the area under the polygon is equal to the total frequency, but that 
the area in some intervals is not equal to the frequency in those intervals. 

Distribution of 106 Empucyees by Age Class 


Age of Employee, Years Employees 

15-24 14 

26-34 49 

35-44 23 

45-54 13 

55-64 6 

65-74 1 







92 


ELEMENTARY SOCIAL STATISTICS 


1 1912 1 

1927 1 

1937 1 

14/ 

1 

^26 

1 

. c 

20.! 
N S 

»40 

A N 

4 

2 

E 


ij 

t 

1.419 

FEEBLE 

f 

4.S 

MINDEC 

M 

1 AND EP! 

( 

ILEPTIC 

1 

5.995 


I 

3,276 

P 

U 

E 

1 

i 

}50 

N A 

( 

L 

i 

11.5' 

! 

y 


6 . Rearrange the frequencies of question 4 in class intervals of 16-18, 
19-22, 23-26, and so on. 

6 . Plot the data of Problem 4 as an ogive, and read off the age below 
which 75 per cent of the employees fall. 

7. Devise a problem for which a Lorenz curve is suitable, graph the 
curve from your data, and show its use. 

8 . Plot the following data in such a way that (a) it gives an unbiased 
picture of the rate of change, (6) it exaggerates the impression of the 
rate of change. 




GRAPHS 


93 


Population of Madison, Wis., 1890-1940* 


Year 

Population 

1890 

13,426 

1900 

19,164 

1910 

25,531 

1920 

38,378 

1930 

57,899 

1940 

66,802 


*** From the Fifteenth Census of the United States, Bureau of the Census. 


References 

Croxton, F. E., and D. J. Cowden: Applied General Statistics, Chaps. IV, 
V, VI, Prentice-Hall, Inc., New York, 1939. 

Karsten, K. G. : Charts and Graphs, Prentice-Hall, Inc., New York, 1923. 
Mudgett, B. D.: Statistical Tables and Graphs, Part II, Houghton Mifflin 
Company, Boston, 1930. 



CHAPTER VII 
AVERAGES AND RATES 


1 . The Need for an Average.—^An investigator is interested, let 
us say, in the height of the residents of a certain Swiss com¬ 
munity in the United States, on the theory that they are taller 
than their relatives in the old country. Unable to measure 
the whole community of over 4,000 persons, he takes a random 
sample of, say, 182 adult males, and gets their heights as accu¬ 
rately as possible. He then finds himself with 182 individual 
measurements. What will he do with them? He may perhaps 
first arrange them in order of magnitude, to form an array. 
If no two of the measurements happen to be identical, he will 
still have 182 different measurements. In any case, it will be 
impossible for him to hold in mind all the separate values, and 
he will feel the need of some one figure by which to represent 
them. This need will be still greater when he attempts to 
determine whether or not the American group is taller than a 
similar group in the Old World, because some of the former will 
be taller than some of the latter, and vice versa. In his search 
for a single figure by which to represent the many, he will cer¬ 
tainly arrive at the idea of calculating ah average. 

2. The Mode.—The simplest form of average is the mode {Mo), 
which is merely the value in a series that occurs most often. 
If the heights are all different, there can be no mode in ungrouped 
data. If some persons are of the same height, however, a mode 
may occur in our array. We then choose as the mode the height 
that occurs the greatest number of times. For example, in the 
following array of the heights of 10 European Swiss males, 4 ft. 
11 in., 6 ft. 3 in., 6 ft. 7 in., 5 ft. 8 in., 6 ft. 9 in., 5 ft. 9 in., 5 ft. 
10 in., 6 ft. 11 in., 6 ft. 0 in,, the mode is 5 ft. 9 in.; but of course 
the sample is too small to give much information about the 
modal height of European Swiss males in general. Whether or 
not a mode is convincing depends on how conspicuously the 
modal height stands out above the others in frequency of occur¬ 
rence. If the height 6 ft. 7 in. occurs 10 times and the height 5 ft. 

04 



AVERAGES AND RATES 


95 


7.3 in. occurs nine times, it is not certain that one is significantly 
more frequent than the other. 

The situation becomes clearer if we decide to overlook slight 
differences in height, and combine our measurements in care¬ 
fully chosen class intervals, as in Table 21. If the distribution 
is rather regular, we may by inspection then determine whether 
or not any one class interval has 
a sufficiently larger frequency 
than any other to be confidently 
regarded as the modal class. If 
so, that is usually all we need 
to know. In Table 21, col. (1), 
the modal interval is evidently 
60 to 64 inches. In col. (2), 
the distribution has two modes, 
i.e., it is bimodal, suggesting 
that it may contain both males 
and females. In such cases, it 


60 
^50 
BAO 
° 30 

a> 

1 20 

2 10 



BIB 

■Bi 



bb 

mm 


BS 

BS 


B 


Si 

iiB 


B 

s 


B 

1 


B 

BB 

B 

B 

B 

B 


B 

B 

niB 

B 

B 

S 

SB 

B 


B 

B 

B 








I"—* 

BB 

■■ 

Hfli 


BH 

bb 


45 


60 


50 55 60 65 70 75 
Height in inches 

28.—Histogram of bimodal 
frequency distribution of Table 21, 
col. (2). 


Fig. 


often helps to plot the data in the form of a frequency polygon 
or histogram, e.g.j Fig. 28. 


Table 21.— Heights of 165 and 182 American Males of Swiss Descent 


Height, inches 

Males 

(V) 

(2) 

45-49 

2 

2 

60-54 

10 

10 

65-59 

21 

55 

60-64 

55 

21 

65-69 

40 

67 

70-74 

32 

32 

75-79 

5 

5 

Total. 

165 

182 


Determination of the exact modal value in grouped data is 
complex, and cannot be treated here. Several rough methods 
of interpolating within the modal class are available, such as that 
of formula (1). 

^ A is the capital Greek letter Delia, 






96 


ELEMENTARY SOCIAL STATISTICS 


where L is the lower limit of the modal class, Ai is the difference 
(disregarding signs) between the frequency of the modal class 
and the frequency of the class just below the modal class on the 
scale, A 2 is the difference (disregarding signs) between the 
frequency of the modal class and the frequency of the class just 
above the modal class, and i is the size of the modal class interval. 
Applying this formula to the distribution of Table 21, col. (1), 
we find the crude mode. 


Mo = 60 + 
Mo = 63.5. 


(55 - 21) (5) 

(55 - 21) + (55 - 40) 


Another approximate method of finding the mode of a fre¬ 
quency distribution is provided by formula (2) : 

Mo = M - 3(M - Md). (2)1 

where M is the arithmetic mean of the distribution, and Md is the 
median, as described below. Assuming that for the distribution 
of Table 21, col. (1), M = 64.68, and Md = 64.5, formula (2) 
gives for the crude mode: 


Mo = 64.68 - 3(64.68 - 64.5). 
Mo = 64.1. 


This value is a little different from that found by formula (1). 

Mention of the conditions under which formula (2) is appropri¬ 
ate is made in the Sec. 5, below. 

3. The Median.—Suppose that we have the following array 
of the heights of 11 American adult males of Swiss descent: 


Table 22.— Heights of 11 
Male 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Adult Males of Swiss Descent 
Height, inches 
68 

68.5 

69 

69.5 

70 

71 

71.5 

72 

72.5 

73 

74 


American 


1 See derivation of formula (54), Chap. IX. 



AVERAGES AND RATES 


97 


A quick way of getting some idea of an average for this series 
would be to note the height that stands at the middle of the 
series. This is seen to be 71 inches, or height number 6 in rank 
order. This kind of average is called the median, which is 
defined as the middle value, or that value which is exceeded by 
as many values as it exceeds. 

Now if a twelfth person of, say, height 75 inches Is added to the 
above group, a difficulty arises. There is no middle value. 
Unless we are willing to take the mean height of the sixth and 


seventh persons 


/71 + 71.5 

V 2 


71.25 


) 


as the median, we must 


say that there is none. Although the median so found, 71.25 
inches, is a height that does not actually appear in the series, 
it is customary for most purposes to accept it as the median. 

Consider another common case. Let the height of the fifth 
person in the first group of 11 persons be 71 inches. Again, 
strictly speaking, there can be no median that meets the defini¬ 
tion, because there are no longer as many heights below the 
middle height as there are above it. As before, a compromise 
is commonly made by taking the middle value (71 inches) as the 
median. 

From the above, it will be noticed that the formula for locating 
the median value in an ungrouped series is to add one to the 
number of values and divide by two: 


N + 1 
2 


( 3 ) 


Thus, above, where iV = 11, the position of the median value is 


11 + 1 
2 

12 + 1 
2 


_ = 6, or the value in position 6; and where N = 12, 
13 

- 7 ^ = 6.5, the median value is the height in position 
2 


6.5, which can only be the mean of the heights in positions 6 and 7. 

The above relates to ungrouped data. When the items of a 
series are grouped in class intervals, the median is regarded 
as the value on the X scale that divides the area of the frequency 
histogram or curve into two equal parts, as shown in Fig. 28. 
Thus, in Table 21, col. (2), N/2 = 182/2 = 91. Now, 88 fre¬ 
quencies fall below the class limit, 65 inches, so that 91 — 88 = 3 
frequencies fall inside the interval 65 to 69. Since there are 57 



98 ELEMENTARY SOCIAL STATISTICS 


frequencies in this interval, and the width of the interval is 

3 

5 inches, the median falls ^ X 5 = 0.263 inch inside the interval. 


or at the point 65 + .263 = 65.263 inches on the X scale. The 

263 

area below the median is then 2 + 10 + 65 + 21 + ^ 

A 707 

(57) = 91, that above the median is (67) + 32 + 5 = 91, 


and the two areas are equal. 

The simplest way to find the median of grouped data is as 
follows: Accumulate the frequencies, as in the last column of 
Table 23. Divide N hy 2: 165/2 = 82.5. Look down the 
column of accumulated frequencies until the frequency in the 
position 82.5 is found, in the interval 60-64. From 82.5 sub¬ 
tract the accumulated number of frequencies below the median 
interval: 82.5 — 33 = 49.5. Multiply the width of the class 
interval by the fraction 49.5/55, formed by the difference just 
found as numerator and the frequency of the median interval 
as the denominator: 5 X 49.5/55 = 4.5, Add this quotient to 
the lower limit of the median interval: 60 + 4.6 = 64.6. This 
is the median height for the table. 


Table 23.—Height op 165 American Adult Males op Swiss Descent 


Height, inches 

Males 

Number 

Accumulated number 

46-49 

2 

2 

60-54 

10 

12 

66-69 

21 

33 

60-64 

55 

88 

66-69 

40 

128 

70-74 

32 

160 

75-79 

5 

165 

Total. 

165 



We can express the above steps by means of a formula, which 
is applicable to frequency distributions: 


Md^L + 



if 


V 


( 4 ) 




AVERAGES AND RATES 


99 


where L is the lower limit of the class interval in which the 
median falls, F is the number of accumulated frequencies that fall 
below in class intervals with limits smaller than those of) the 
median class interval, / is the number of frequencies in the 
median class interval, i is the size of the median class interval, 
and N is the total frequency of table. N/2 is first found, and 
then the remaining symbols can be evaluated and substituted 
in the formula, as indicated in the preceding paragraph. Thus, 


for the problem above, Md = 60 + 


before. 

4. The Arithmetic Mean.—The arithmetic mean, M, is the 
type of average that is most often used. It is the sum of the 
X values divided by their number, N: 



(5)* 


For example, in the case of the ungrouped values, 3, 7, 2, 12, 
1, 16, 4, representing the numbei-s of children in seven Italian 
immigrant families, their sum is 45, and there are seven of them, 
so that M = 45/7 = 6.43. 

If some of the above values had occurred more than once, we 
might have 


X 

1 

2 

2 

3 

3 

3 

4 
4 
4 
4 

M = 


X {Continued) 

4 

7 

7 

7 

12 

12 

16 

16 

111 


111 

18 


6.17 


But, as shown in Chap. V, this long array may be condensed: 


* The Greek letter, 2, capital Sigma, means to sum, or add, the X values. 



100 


ELEMENTARY SOCIAL STATISTICS 


Tablb 24.— Number op Children in 18 Italian Immigrant Families 


Children ! 
(X) 

Families 

(/)* 

fX\ 

1 

1 

1 

2 

2 

4 

3 

3 

9 

4 

5 

20 

7 

3 

21 

12 

2 

24 

16 

2 

32 

Total. 

18 

111 

1 


* Frequency. 

t Frequency multiplied by X , 


In the case of grouped data, it is more convenient to write 
formula (5) in the form 


M = 


XfX 


( 6 ) 


where / is the frequency. 

Substituting in formula (6) N — IS and the total of the third 
column of the array just above, 


M = W = 6.17, 

as before. 

Formula (6) may be applied to any frequency distribution, e.g.y 
that of Table 25. 


Table 25.— Height op 165 American Adult Males op Swiss Descent 


Height 1 

Adult males 

Inches 

X* 

/ 

fx 

45-49 

47.5 

1 2 

95.0 

50-54 

52.5 


525.0 

55-59 

57.5 

21 

1207.5 

60-64 

62.5 

55 

3437.5 

65-69 

67.5 

40 

2700.0 

70-74 

72.5 

32 

2320.0 

75-79 

77.5 

5 

387.5 

Total. 


165 

10,672.5 


* Mid-points. 

10,672.5 












AVERAGES AND RATES 


101 


As pointed out earlier, the mean calculated from a frequency 
table in which the mid-points are not identical with the means 
within the intervals is, of course, somewhat inaccurate, as is 
any other average or statistic found from such a table. 

It is possible to simplify the calculations needed to find the 
arithmetic mean (usually called simply the mean) in a frequency 
distribution such as that of Table 25. Suppose that the mid¬ 
points of any distribution are Xi, X 2 , X 3 , etc. They can be 



Fia. 29.—Diagram used in derivation of formula (13). 


represented by the above diagram, where fi is the frequency 
in the Xi interval, etc., measured along the Y axis. 

By formula ( 6 ), 


M = 


yx 

N 


But suppose that we choose to measure the X values from some 
arbitrarily assumed or ^^guessed’^ point on the X axis, say A, 
in Fig. 29. Then 

Xi = A + Xi', (7) 

X2 = A + X2', 

where the X'^s represent the distances of the X^s measured from 
A. 

If we further choose to reduce the size of the X'^s by dividing 
them by the size of the class interval, i, or other constant, we have 

^ = di, (8) 

X2 . , 

—r- — ( 12 ) etc., 

% 

or 

Xi' = dii) (9) 

^ X 2 ' = ^2^ etc. 

Substituting the values of Xi' and X 2 ' from (9) in (7), 

Xi == A -f- diiy 
X 2 = A + d 2 i) etc. 


( 10 ) 



102 


ELEMENTARY SOCIAL STATISTICS 


Substituting from (10) in (6), 


_ ViA + rft) 
^ i\r ^ iv 


( 11 ) 


Constants can always be placed outside the summation sign/ so 
that 


Now 
So that 


^ N ^ N 
Xf = N. 


ar 4^-1- *2/^ 


or 


M = A + 


i'Zfd 

N ■ 


(13) 


Let us apply formula (13) to find the mean of the frequency 
distribution of Table 26: 


Table 26.— Height of 166 American Adult Males op Swiss Descent 


■1 

X' == X - A 

II 

/ 

Jd 

47.5 

-15 

-3 

2 

- 6 

62.5 

-10 

-2 

10 

-20 

67.5 

- 5 

-1 

21 

-21 

62.5 

0 

0 

65 

0 

67.6 

+ 6 

+1 

40 

+40 

72.5 

+10 

+2 

32 

+64 

77.5 

+16 

+3 

5 

+15 

Total. 



165 

+72 


♦ Mid’points. 


In the above table, by arbitrary choice, the assumed mean is 

A = 62.5. 
i = 5. 

2/d = 72. 

N = 165. 

1 Notice the principle that 

X(X + Y) ~ XX + XY 


( 12 ) 












AVERAGES AND RATES 


103 


Therefore, substituting in formula (13), 


M = 62.5 + 


5(72) 
(165)^ 
M = 62.5 + 5(.436), 
M = 62.5 + 2.18, 

M = 64.68, 


which is the same as we found the mean to be by the “long^’ 
method. The calculations required in Table 26 are greatly 
reduced compared with those in Table 25. 

The second column of Table 26 is inserted for explanatory 
purposes only, and is omitted except when irregular class intervals 
cause difficulties. Table 27 illustrates the usual form for 
computation. 


Table 27.— Number of Reoef Cases per Block in a Slum Area of a 

City 


X 

d 

/ 

Jd 

1.5 

-3 

4 

-12 

3.5 


10 


5.5 

-1 

14 

-14 

7,5 

0 

26 

0 

9.5 

4-1 

19 

+19 

11.5 

+2 

14 

+28 

13.5 

+3 

8 

+24 

15.5 

-f4 

4 

+16 

17.5 

+5 , 

1 

+ 5 

Total. 


100 

+46 

Af = 7.5 + 2 

(^) - 


Notice that it makes no difference in the result where the 
d = 0— i,e.j the assumed mean—is placed. A good way to check 
the work is to perform the calculations from two different 
assumed means. 

Formula (13) also holds for irregular class intervals if i, 
which may be any convenient divisor as well as the size of the 
class interval, is held constant. This may be illustrated below. 

Consider this table of age distribution for Wisconsin, from the 
1930 census. 













104 


ELEMENTARY SOCIAL STATISTICS 


Table 28. —Age Distribution of Population, Wisconsin, ±930 


Age 

Per 

cent/ 

X* 

X - A 

X - A . 

5 

fd 

Accumu¬ 
lated /t 

Under 5. 

9.2 

2.5 

-20.0 

-4.0 

-36.8 

9.2 

5-9. 

9.9 

7.5 

-15.0 

-3.0 

-29.7 

19.1 

10-14. 

9.7 

12.5 

-10.0 

-2.0 

-19.4 

28.8 

15-19. 

9.2 

17.5 

- 5.0 

-1.0 

- 9.2 

38.0 

20-24. 

8.3 

22.5 

0 

0 

0 

46.3 

25-29. 

7.7 

27.5 

+ 5.0 

+ 1.0 

+ 7.7 

54.0 

30-34. 

7.4 

32.5 

-flO.O 

+2.0 

+14.8 

61.4 

35-44. 

14.0 

40.0 

+17.5 

+3.5 

+49.0 

75.4 

45-54. 

10.6 

50.0 

+27.5 

+5.5 

+58.3 

86.0 

55-64. 

7.2 

60.0 

+37.5 

+7.5 

+54.0 

93.2 

65-74. 

4.6 

70.0 

+47.5 

+9.5 

+43.7 

97.8 

75 and over. 

2.0 

? 


? 

? 

99.8 

Total. 

99.8 




"132.4 






* Mid-points. The census records age in whole years, as of tiie last birthday. But since 
the actual ages are not discrete, age should be treated as continuous. Otherwise, all aver¬ 
ages will be too low. 

t Accumulated frequencies. 

Finding the mean for the table below age 75^ by substituting 
in formula (13), 


M = 22.5 + 5 = 22.5 + 5(1.354) 

M = 29.27 

In such a table as this, with an open interval, only the median 
and mode can be found for the total table. Why? What is 
the median value for Table 28? 

6. Interpretation of the Common Averages.—The arithmetic 
mean. My is the most familiar type of average. It is amenable 
to algebraic operations which cannot be applied to the median 
or mode. Suppose we know that the mean of one distribution of 
60 items is 4, and the mean of a second comparable distribu¬ 
tion of 75 items is 6. Then the mean of both distributions is 

(4 X 50) + (6 X 75) , « ^ 

-^ - - = 5.2. The only accurate way of finding 

1 The mean, of course, cannot be found for the table including the open 
interval, **75 and over,” because no mid-point can be assigned to an open 
interval. 





















AVERAGES AND RATES 


105 


the median of the total distribution is actually to combine the 
distributions, interval by interval, and recompute the median 
and mode for the combined distribution, just as was done for the 
separate distributions. If there are several medians given, it is 
possible to find the median median, but it is not likely to be the 
same as the median of the combined distributions. Although 
the mean of two or more medians is sometimes used, the meaning 
of such a combination of averages is not clear. correct total 
cannot be obtained by multiplying the median by the number of 
items^^ in a distribution.^ 

A second characteristic of the mean is that it alone of the 
three averages reflects the exact value of every item. If extreme 
values occur in a series, they affect the mean much more than 
the median or the mode, because the median is affected only 
by the circumstance that an item is greater or smaller than the 
median item—the amount of the difference being of no conse¬ 
quence—and the mode is affected only by whether or not the 
size of a value throws it into one class interval or another. Con¬ 
sider the series of ages in years, 2, 4, 7, 10, 13, 15, 19. M = 10, 
Md = 10. If the three items that are larger than 10 are replaced by 
three others also larger than 10, the Md stays the same, but the 
M changes. Thus for 2, 4, 7, 10, 58, 70, 80, M = 33, Md = 10. 
This is sometimes an advantage of the mean, and sometimes a 
disadvantage. If the extreme values are regarded as atypical 
of the series, the median will be a better average than the mean, 
because the median is less influenced by such values. If, on 
the other hand, the extreme values are thought to be an integral 
part of the series and to deserve full weight, then the mean is 
more appropriate than the median. In series where the mean 
seems inappropriate, it is often advisable to question the repre¬ 
sentativeness of any average, and to drop the atypical items. 

A third important trait of the mean is that it usually changes 
less than the other two averages, from sample to sample. 
Suppose that the I.Q.^s of the first 100 students met on a college 
campus are taken and the My Mdy and Mo of these I.Q.^s are 
computed. The same thing is done with a second hundred stu¬ 
dents, a third, and so on. Then the differences between the means 
of the several samples will generally be less than the differences 

' WiLFORD I. Kino, The Elements of Statistical Methody p. 131, The Mac¬ 
millan Company, New York, 1918. 



106 


ELEMENTARY SOCIAL STATISTICS 


between the medians or the modes. This sampling stability of 
the mean is very much in its favor. 

For such reasons as the above, in the averaging of measure¬ 
ments the arithmetic mean is always to be preferred to the 
median or mode unless it is felt to be much less representative 

of the series than they are, or 
unless, because of open-end class 
intervals, the mean cannot be 
calculated. When we have to deal 
with a scries of ranked items, rather 
than measured values, however, 
only the median applies. 

A frequency distribution is exact¬ 
ly balanced along the perpendicu¬ 
lar erected at the mean. The sum 
of the deviations of a series of 
values from their mean with regard 
Number of heads for signs, t.e., the algebraic sum, is 

Fia. 30.—Graph of symmet- always zero. This is not true of the 
ncal frequency distribution of . ^ . i. , 

Table 29. other averages except in perfectly 

symmetrical distributions, where 
the mean, median, and mode all coincide (see Fig. 30), A dis¬ 
tribution is symmetrical when equal frequencies occur at equal 
distances above and below the mean, as in Table 29 and Fig. 30. 




On the other hand, when signs are disregarded, the sum of the 
deviations is least in the case of the median. 





AVERAGES AND RATES 


107 


Typically, in distributions that are not symmetrical or bell¬ 
shaped, but skewed, i.e., extending farther on one side than on 
the other, the mean is pulled farthest in the direction of the 
skewness (because of its sensitiveness to extreme values), the 
mode is nearest the end of the scale opposite the direction of 
the skewness, and the median falls somewhere in between the 
other two (see Fig. 31). Indeed, in moderately skewed dis¬ 
tributions, the median is generally about one-third of the dis¬ 
tance from the mean to the mode, a fact utilized in formula (2) 
above. If the three averages are calculated for the skew dis¬ 
tribution of Table 28 below age 75, using formula (2) for the mode, 
they will be found to fall in this way (M = 29.27; Md = 27.34; 
Mo = 23.48). 



MoMdM M'MdMc 


Fiq. 31.—Skewed frequency distributions. 

The usefulness of any average usually depends upon how 
representative it is of its distribution or series, ^.e., upon what 
proportion of the items in the series is close to the average. 
Although it is mathematically possible to calculate the mean, 
median, or mode for any series, the concept of the average as a 
value representative of the series has much more validity in the 
case of some series than of others. It is most valid for symmet¬ 
rical distributions, and least valid for distributions shaped like 
the letter J (or reversed J), or the letter U, illustrated in Table 
30, cols. (2) and (3), respectively, and Figs. 33, 34. In the case of 
J and U shaped distributions, any average is likely to conceal more 
important information than it reveals, and for this reason it is 
usually advisable not to compute averages for distributions of 
Table 30. —Age Distributions (Hypothetical Data) 


Years of age 







108 


ELEMENTARY SOCIAL STATISTICS 


such extreme types. Perhaps the mode is the best of the 
three averages in situations of this kind; but even its value is 
questionable. 

Enlarging on the last point, special precaution is necessary to 
avoid the use of averages to represent a group that varies widely 
within itself. Thus a single infant mortality rate for a county 
containing a large city and a rural area in which the rates are 
very different is likely to be not only meaningless, but misleading. 
This point must be kept constantly in mind in most statistical 
problems, e.gr., the calculation of a correlation coefficient. The 
latter, which is an average, may indicate a moderate amount of 



0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 


X* Years of age X=Years of age JT^Years of age 

Fig. 32. Fig. 33. Fig. 34. 

Fig. 32.—Graph of roughly symmetrical frequency distribution of Table 30, 
Col. (1). 

Fig. 33. —Graph of J-shaped frequency distribution of Table 30, Col. (2). 

Fig. 34.—Graph of U-shaped frequency distribution of Table 30, Col. (3). 

relationship over the whole table, whereas actually there is no 
relationship at one end of the table and a close relationship at 
the other (see Chap. X). 

It should be noticed that an average, usually the mean, may 
sometimes legitimately be used for the purpose of resolving a 
series of values into a single composite value, whether the latter is 

representative’^ of the values in the series or not. This is the 
case when the chief interest lies merely in comparing the com¬ 
posite values of two or more series, as the mean size of income of 
all workers with the mean size of income of unskilled laborers 
alone. 

In most cases, it is important to exhibit the table of the fre¬ 
quency distribution as a whole, so that the distribution of the 
items, as well as their averages, may be known to the reader. 





AVERAGES AND RATES 


109 


It is also a practice in doubtful cases to present all three averages 
side by side, so that their differences may be seen. This, how¬ 
ever, may merely throw upon the reader the responsibility of 
choosing an average. 

6. The Geometric Mean.—In averaging a series of numbers 
that bear an approximately constant ratio to one another, like 
2, 4, 8, 16, none of the three averages described above is as 
appropriate as the geometric mean. The geometric mean is 
used to average any series in which changes are expressed as 
rates rather than as absolute differences. It is also preferable 
for averaging some skewed distributions, since it gives less 
weight to extreme variations than does the arithmetic mean. 

The geometric mean is always smaller than the corresponding 
arithmetic mean. When a series contains a zero or negative 
value, its geometric mean cannot be found. Just as the sum 
of the plus deviations is equal to the sum of the minus devia¬ 
tions from the arithmetic mean, so the product of the ratios 
of the values smaller than the geometric mean to the geometric 
mean is equal to the product of the ratios of the geometric mean 
to the values larger than the geometric mean {e.g., the geometric 

mean of 5, 8,10, and 12 is 8.3, and X ^ ^ X Also, 

corresponding to the fact that when each member of a series is 
replaced by the arithmetic mean of the series the sum of the 
series is not changed {e.g., 3 + 7 + 5 = 15, and 5 + 5 + 5 = 15), 
so, when each member of a series is replaced by the geometric 
mean, the product remains the same {e.g.y 12 X 34 X 4 = 1,632, 
and 11.7735 X 11.7735 X 11.7735 = 1,632). 

For an ungrouped series of values Xi, X 2 , . . . Xn, the formula 
for the geometric mean is 

G = -VXx -Xi - • • X„. (14) 

For grouped data, 

G = VX,^' ■ X/* • • • X/-, (15) 

where Xi is a mid-point and/*, its exponent,^ is the corresponding 
class frequency. Computation, however, is most conveniently 

1 Exponent means the power to which X is raised, e.g., (X)*. Here the 
exponent is 2, the second power. 



110 ELEMENTARY SOCIAL STATISTICS 

done by means of logarithms, using the respective formulas: 


log(? = i2^‘>8^‘ 

n ^ 

(16) 

n 

logG = ^^f,logX„ 

(17) 


where N = 

To illustrate the use of formula (16), the geometric mean of 
the rates in col. (4) of Table 32 below is found from a table of 
loga,rithms:‘ 

log G = |(log 0.015 + log 0.058 + log 0.061 + log 0.047 

+ log 0.020 + log 0.011 + log 0.001) 
= 1(8.17609 - 10 + 8.76343 - 10 + 8.78533 - 10 
+ 8.67210 - 10 + 8.46240 - 10 + 8.04139 

- 10 + 7.00000 - 10) 

= 1(57.90074 - 70) 

= 8.27153 - 10 

G = 0.019 

Notice that the geometric mean obtained by formula (16) is un¬ 
weighted, i.e., each rate is given equal weight. The unweighted 
arithmetic mean of the same rates is 0.03171, while the weighted 
arithmetic mean rate, from cols. (2) and (3) of Table 32, is 
26,326/790,193 = 0.03331. 

The total column of the table in Exercise 1 below shows a 
skewed distribution, so that the geometric mean should be more 
representative of it than the arithmetic mean. By formula (17), 
log G = 7fr(73 log 2 -H 96 log 6 + 101 log 10 + 48 log 14 

+ 52 log 18 -I- 21 log 22) 
= tJt[73(0.30103) -|- 96(0.77815) + 101(1) + 48(1.14613) 

-1- 52(1.25527) -|- 21(1.34242)] 
= tK( 21.97519 -f 74.70240 + 101.00000 + 55.01424 

+ 65.27404 + 28.19082) = 0.88531 
G = 7.68 

The arithmetic mean is 9.72 and the median is 9.05. Formula 
(17) gives a weighted geometric mean, or geometric mean of a 
frequency distribution. 

^ See Appendix, Table 7, and accompanying explanation. 



AVERAGES AND RATES 


111 


Notice the application of the geometric mean to the problem 
of estimating the population midway between two decennial 
censuses. In Table 31 the population of the United States in 
millions is shown at 10-year intervals from 1790 to 1940. When 


Table 31.— Population op the United States, 1790-1940 
(In millions) 


Year 

Population 

Year 

Population 

1790 

3.93 

1870 

38.56 

1800 

5.31 

1880 

60.16 

1810 

7.24 

1890 

62.95 

1820 

9.64 

1900 

75.99 

1830 

12.87 

1910 

91.97 

1840 

17.09 

1920 

105.71 

1850 

23.19 

1930 

122.78 

1860 

31.44 

1940 

131.41 (prelim.) 


these figures are plotted, we get the absolute growth curve 
shown in Fig. 35. Now suppose it is wanted to estimate the 



Fig. 36.—Absolute growth of population, United States, 1790-1940. 

population in 1795, midway between the censuses of 1790 and 
1800. If we take the arithmetic mean of the populations at 
1790 and At 1800, we have 


5.31 + 3.93 


4.62 millions. 


2 


























112 


ELEMENTARY SOCIAL STATISTICS 


This evidently assumes that the absolute amount of population 
increase is the same over equal periods of time, since 

4.62 - 3.93 = 0.69, 

and 5.31 — 4.62 = 0.69. From Table 31, however, we see 
that the differences, 5.31 — 3.93 == 1.38 and 7.24 — 5.31 = 1.93, 
are not equal, and this is borne out by inspection of Fig. 35. 
Actually, around the dates 1790 and 1800, as far as we can judge 
from the given data, the absolute growth in population was 
increasing. Under these conditions, the growth curve between 
1790 and 1800 would probably be concave, as shown by line a 
in Fig. 36. The population in 1795 would then be somewhat 
less than that found by the method of the arithmetic mean, 



Fig. 36.—Probable trend of population growth in the United States, 1790-1800. 

which implies a straight line rather than a concave trend (line h 
in Fig. 36). On the simple assumption that the rate of annual 
increase was constant between 1790 and 1800, the growth curve 
will be concave, and the geometric mean will give the exact 
population in 1795. The geometric mean is, therefore, usually 
regarded as the logical average to use when the growth curve is 
concave. The formula may be written 

P = = (PoPio)*, ( 18 ) 

where P is the population midway between the two censuses, Pq 
is the population at the first census, and Pio is the population at 
the second census. Substituting in this formula, 

P = \/3.93(5.31) = 4.57 millions. 

The geometric mean is not a suitable average, however, when 
the absolute amount of change is less each decade, as happened 
between 1930 and 1940. The growth curve is then convex (like 
c. Fig. 36), so that both the arithmetic and the geometric means ’ 
give too low estimates of the population midway between 
censuses. 




AVERAGES AND RATES 


113 


If we wish to calculate the constant annual rate of population 
increase that was assumed in finding the geometric mean (= 4.57) 
above, we apply the formula 

r - - 1. (19) 

If, as before, Pq = 3.93, Pn = 5.31, and n = 10, we have 

■■ - (s)* - * - - *■ 

By logarithms, 

log (1.35)A = A log 1.35 = iV(0.13033) = 0.013033. 

So 

(1.35)* = 1.03, 
and r = 1.03 — 1.00 = 0.03. 

That is, in finding the geometric mean we assumed that the 
population increased at the average rate of about 3 per cent per 
year between 1790 and 1800. 

For the same problem, the arithmetic mean gives a rate of 
5 31 — 3 93 

~ V93(T()) " ~ cent, which if assumed to be constant 

over the 10-year period would result in a population in 1800 of 

= p,(l -f r)io. (20)1 

Pio = 3 . 93 ( 1 . 035 ) 10 . 

log (1.035)10 =, 10 log 1.035 = 10(0.01494) = 0.14940. 

So 

(1.035)10 = 1.411, 

and 

Pio = 3.93(1.411), 
or 

Pio = 5.55 millions, 
whereas, actually, Pio = 5,31 millions. 


^ Formulas (19) and (20) may be derived as follows: 

Let 

Po « Population of the state on Jan. 1, 1930, 

Pi * Population of the state on Jan. 1, 1931; etc. 
r « constant annual rate of increase. 

For remainder of footnote see page 114. 



114 


ELEMENTARY SOCIAL STATISTICS 


7. Population Rates. —The ratio of divorces to population, say 
3.2 per 1,000, is an illustration of the kind of rate that is important 
for sociologists. Other examples are the crude birth rate (births 
per 1,000 population per year) and the crime rate (say, convic¬ 
tions per 1,000 males 10 years old and over per year). A rate 
shows the amount of one variable per given amount of another 
variable c.gr., the number of births in relation to a given number 
of women of child-bearing age in a population. 

In working with population rates, such as marriage rates, 
death rates, etc., it is helpful to have in mind what is meant 
by a rate. Mathematicians define a rate as the amount of 
change in a function (dependent variable) that occurs per unit 
change in the independent variable. The rate of travel of an 
automobile is the number of miles by which its position in space 


Then 

Pi =Po+Por =Po(l +r). 
Pa = Pi + Pi r 

= Po(l -hr) -f Pod +r)r 
= Pod -hr)d +r) 

- Pod + r)S 
Similarly, 


Ps = Pa + Pa r 

= Pod +ry +Pod +r)*r 
- Pod + r)Hl + r) 

= Po(l H- r)\ 


Pio = Po(l + r)»® 


If n = number of years between censuses, 

P„ = Po(l + r)« 

(l+r)«=5=; 

log (1 + r)" = log 

-TO 

n log (1 -h r) « log 

0 

log (1 + r) = i log 

n I 0 

1 

log(l+r) -log(^J", 



1 . 



AVERAGES AND RATES 


116 


(function) changes per change of 1 hour in time (independent 
variable). How does a marriage rate fit the mathematical 
idea of a rate? The usual form of the marriage rate is the 
number of marriages per 1,000 population per year. Here, 
however, there are three variables instead of the two mentioned 
in the mathematical definition of a rate. Which of these is the 
function, which is the independent variable, and how is the third 
variable to be interpreted? The time element is usually regarded 
as the independent variable in rate problems, and the factor 
that varies with time, as the function. In our example, both 
the number of marriages and the size of the population base may 
change from year to year. Either of these alone related to time 
would give a mathematical rate. But we are not interested in 
such a rate. Rather, we want to know how the ratio of marriages 
to total population changes with time. It is, then, this ratio 
that is the function in our marriage rate. 

In the case of the marriage rate, we are primarily interested 
in the annual changes in the number of marriages, and not in the 
change in the population base. The only reason for introducing 
the population base at all is to eliminate it as a cause of change 
in the number of marriages, so that the annual change in the 
number of marriages may be comparable from one population to 
another. 

This raises an important point. Is the population base the 
only factor that needs to be eliminated or controlled in order 
that the marriage rate may mean just what we want it to? In 
order to have the marriage rate as comparable as possible from 
one population to another, should we not also control the factors 
of age and sex composition, so that their influences are removed 
from the rate? That depends on the question we want to 
answer. If our question is, which of two or more total popula¬ 
tions has the higher marriage ratio, regardless of the causes 
involved, we do not control age and sex in our ratio. But if 
we wish to know which of the populations would have the higher 
marriage rate if their age and sex distributions were the same, 
we must control age and sex. This leads us to the so-called 
age-specifio, gross, and net marriage rates for females. In all 
such rates, we note as a general principle that the denominator 
or base of the final rate should ideally contain only the group 
exposed to the event (c.^., if the event is marriage, the group 



116 


ELEMENTARY SOCIAL STATISTICS 


exposed should be composed exclusively of, say, unmarried 
females), while the numerator should contain the number of 
events {e.g., marriages) occurring in the year. In the case of 
most crude rates, like official birth and marriage rates, this 
principle is disregarded. 

When marriage or other rates are plotted, it is usually advisa¬ 
ble to plot them on semilogarithmic paper, in order that the rate of 
change may be shown by the steepness of the graph. Plotting the 
rates directly on semilogarithmic paper is equivalent to plotting 
the logarithms of the rates, which in turn is similar to plotting 
the percentage of change in the rates from year to year (see Chap. 
VI, Fig. 19). 

It may be of interest to compute two of the most important of 
the refined rates used in current vital statistics. The gross 
reproduction rate is defined as the average number of girls born 
per woman passing through child-bearing age, say 15 to 50 years, 
without mortality, and exposed to the birth rate of a given year. 
The net reproduction rate is simply the gross reproduction rate 
corrected for mortality. In Table 32 these rates have been found 
for Wisconsin, with the year 1934 as the base. The gross rate 
appears as the total of col. (5), and the net rate as the total of 
col. (7). It is seen that each 1,000 women born, if none died and 
all were subjected to the average age-specific rates of 1934, would 
bear 1,110 daughters. However, if these 1,000 women were 
exposed to the death rates found in an appropriate life table, 
they would bear only 995 daughters to start the next generation. 
Since the actual distribution of women by age groups was 
eliminated as a factor in Table 32 from col. (4) on, it is possible 
that there may be a disproportionate number of young females 
in the population of Wisconsin in 1934 and that this may prevent 
the population from actually declining for a time, even though 
the net reproduction rate is less than 1. But if the net reproduc¬ 
tion rate of 1934 should continue until the age distribution was 
stabilized, the female population of the state would then begin to 
decrease at the rate of 5 per 1,000 per generation. As a matter 
of fact, the birth rate was unusually low in 1934 on account of 
the economic depression and has since risen somewhat. The 
average birth rate over a period of, say, 3 to 5 years furnishes a 
more stable base than the rate for a single year, and for some 
purposes should be preferred. 



AVER AGES AND BATES 


117 


Table 32.— Gross and Net Rbpbodtjction Rates in Wisconsin, 1934 


Age 

groups 

(1) 

Females, 
15-19, 
July 1, 
1934 

(2)* 

Female 

live 

births, 

1934 

(3)t 

Daughters 
born per 
female, 
15-49, 
1934 

m 

Average 
daughters 
born to 
female, 
15-49, in 
5-year 
period 
(5)§ 

Female 

survival 

rates 

from 

birth 

(6)11 

Average 
daughters 
born to a 
female 
surviving 
to age 50 

(7)11 

15-19 

139,600 

2,147 

0.015 

0.075 

0.92512 

0.070 

20-24 

131,369 

7,599 

0.058 

0.290 

0.91480 

0.265 

25-29 

118,042 

7,256 

0.061 

0.305 

0.90117 

0.275 

30-34 

108,496 

5,090 

0.047 

0.235 

0.88626 

0.208 

35-39 

103,165 

2,987 

0.029 

0.145 

0.87016 

0.126 

40-44 

101,089 1 

1,147 

0.011 

0.055 

0.85071 

0.047 

45-49 

88,432 

100 

0.001 

0.005 

0.82522 

0.004 

Total. 

790,193 

26,326 


1.110 


0.995 


* Estimated from the 1930 census with the aid of a life table for Wisconsin, 
t Found by applying the percentage of total births, female, in 1934 to total live births, 
corrected for underregiatration. 
t Column (3) divided by col. (2). 

S Column (4) multiplied by 5, since a woman in any 5-year age group Ls assumed to bear 
as many daughters in each of the 5 years as in 1934. 

II Taken from Life Table for White Females in W’isconsin, 1929-1931, prepared by the 
Metropolitan Life Insurance Company, 

^ Column (5) multiplied by col. (6). 


Exercises 

Note: A calculating machine will save time in solving the problems 
in this text. At least the student should own an inexpensive slide rule. 

1. a. Find the crude mode, where appropriate, of each of the follow¬ 
ing four series, and of all four combined, using formula (1): 


Age of Ciiildhen in Four Three-gener.\tion Kinship Groups 


Age of child, 
years 

(xy 

Number of children in kinship group 

Total 

children 

(/) 

I 

(/.)t 

II 

(fd ■ 

III 

(/.) 

IV 

(fi) 

2 

10 

15 

30 

18 

73 

6 

21 

20 

29 

26 

96 

10 

33 

36 

18 

14 

101 

14 

20 

10 

6 

12 

48 

18 * 

12 

7 

5 

28 

52 

22 

4 

1 

3 

13 

21 

Total. 

100 

89 

91 

111 

391 


* Mid-point, 
t Frequonoy. 










118 


ELEMENTARY SOCIAL STATISTICS 


h. By use of formula (2), find the crude mode of the total series of 
ages. 

2. a. What is the median of each of the six series below? 


Number of Persons per Broken Home 


Set I 

Set II 

Set III 

Set IV 

Set V 

Set VI 

3 





3 

5 





1 

4 





2 

1 





4 

6 

6 

6 


5 

5 

8 

8 

8 


8 

8 

2 

2 

2 

6 

11 

4 

11 

111 

11 

4 

5 

11 





12 



h. What is the median of series IV and VI combined (added by rows) ? 
Note: These series contain too few cases for the medians to have much 
meaning; they are useful only for practice in finding the median. 

3. a. Calculate the median of the two frequency distributions below; 


Percentage of Churches without a Fulit-time Minister in the Rural 
Counties of Two Regions 


Percentage 
of churches {X*) 

Region I, 
counties (/i) 

Region II, 
counties (/*) 

Regions I and II, 
counties (/) 

2.5 

22 

4 

26 

7.5 

94 

18 

112 

12.5 

221 

26 

247 

17.5 

85 

17 

102 

22.5 

67 

25 

92 

27.5 

39 

14 

53 

Total. 

528 

104 

632 


* Mid'point. 


6. What is the median of the two distributions combined? How 
does it compare with the mean of the medians of the two separate dis¬ 
tributions? What is the meaning of the mean of the medians? 

4. The rural counties in 15 states were scored on various points, such 
as percentage of homes with telephone, per capita expenditure for 



























AVERAGES AND RATES 


119 


schools, and so on, and the median score for the counties in each state 
was determined, giving 15 medians. It was then wanted to know the 
median score of all counties in the 15 states together. How would you 
find this? 

6. In the table below, what is the arithmetic mean of (a) the popula¬ 
tions of the counties? (6) the birth rates? 


County 

Population 

(X,) 

Birth rate per 
1,000 population 
(X,) 

1 

8,003 

19.5 

2 

21,054 

24.5 

3 

34,301 

21.1 

4 

15,006 

9.8 

5 

72,573 

23.1 ^ 

6 

15,330 

16.4 

7 

10,233 

17.4 

8 

16,848 

12.6 

9 

37,581 

21.2 

10 

34,165 

16.7 

11 

30,503 

19.1 

12 

16,781 

21.6 

13 

119,217 

18.3 

14 

52,745 

14.0 

15 

18,182 

18.6 

16 

46,583 

16.9 

17 

27,037 

17.3 

18 

42,565 

22.1 

19 

3,815 

15.5 

20 

59,928 

16.9 

21 

11,471 

25.2 

22 

38,469 

19.9 

23 

21,953 

18.1 

24 

13,913 

13.5 

25 

20,039 

19.2 


6. Find the mean of the following table by the short method, and 
check it by changing the assumed mean. 





120 


ELEMENTARY SOCIAL STATISTICS 


Weekly Wages Received by 500 Women Employed in a Garment 

Factory 


Weekly wages, 

Women 

iX) 

if) 

$2.50- 3.49 

5 

3.50- 4.49 

71 

4.50- 5.49 

126 

5.50- 6.49 

132 

6.50- 7.49 

98 

7.50- 8.49 

47 

8.50- 9.49 

23 

9.50-10.49 

9 

Total. 

. 511 


7. What is the mean of the table below? 

Annual Net Incomes of 150 Louisiana Cotton Farms, 1936 



Income 

Farms 


(X)* 

(/)t 


$ 500 

62 


750 

45 


1000 

23 


1250 

8 


1500 

6 


1750 

2 


2000 

2 


2250 

1 


2500 

1 


Total. 

. . 150 

♦ Mid-point, 
t Frequency. 



8. Calculate the 

mean number of years 

on farm reported by Iowa 

farmers in 1929. Use deviations from an assumed mean. 

Iowa Farm Operators Classified According to Number of Years on 


Farm, 1930 


Years on farm Farmers 

{X) (/) 

Under 1 year. 25,625 

1 year. 20,140 

2 to 4 years. 36,496 

5 to 9 years. 33,465 

10 years and over*. 92,142 

Total. 207,868 


(Abstract of the Fifteenth Censud of the United States, 1930, p. 682) 

* Take the mid-point of this interval at 15 years. 

9. The arithmetic mean of the number of years on farm reported by 
249,588 Alabama farmers in 1930 was 6.1. What is the mean number 











AVERAGES AND RATES 


121 


of years reported by Iowa and Alabama farmers combined, using the 
data of Exercise 8? 

10. The counties of Oklahoma are to be grouped according to their 
infant mortality rates in 1939 as published by the Oklahoma Bureau of 
Vital Statistics, with the purpose of correlating these rates with the 
per capita expenditures for public schools. Have you any criticisms 
of this method? 

11. A writer on the family recently made this statement: **The Census 
Report for 1930 showed the average size of the American family to be 
3.81 persons. But averages tell us little.’' Can you suggest any 
important information that this average conceals? 

12. Can you propose a refinement of the crude marriage rate analo¬ 
gous to the gross reproduction rate described in the text? How would 
it differ in meaning from the present crude rate? 

13. Calculate the net reproduction rate for your state, and explain 
its meaning. Is the population of the state increasing or decreasing 
at present? If the answer to tliis question seems to contradict the net 
reproduction rate found, can you reconcile the difference? 

14. What do you consider to be the most meaningful base for a divorce 
rate and why? 

15. At what mean rate did the population of Nashville, Tenn., increase 
between 1880 and 1890? Between 1920 and 1930? Plot the observed 
populations first on rectangular coordinate paper, then on semiloga- 
rithmic paper, and study the differences. 

Population of Nashville, Tenn., 1870-1930 


Census 

Population 

1870 

25,805 

1880 

43,350 

1890 

76,168 

1900 

80,865 

1910 

110,364 

1920 

118,342 

1930 

153,866 


16. Using the data of Exercise 15, compare the geometric and arith¬ 
metic mean populations of Nashville, Tenn., between 1870 and 1930, and 
plot them in the graphs prepared in Exercise 15. Explain the results. 

References 

Chaddock, R. E.: Prindplea and Methods of Statistics ^ Chaps. VI, VII, VIII, 
Houghton Mifflin Company, Boston, 1925. 

Croxton, F. E., and D. J. Cowden: Applied General Statistics, Chap. IX, 
Prentice-Hall, Inc., New York, 1939. 

Yule, G. U., and M. G. Kendall: An Introduction to the Theory of 
Statistics, Chap. VII, Charles Griffin &, Company, Ltd., London, 1937. 



CHAPTER VIII 

MEASURES OF DEVIATION AND PARTITION 

• 

1. Deviation from an Average. —It is seldom possible to give 
a good idea of a series of ungrouped values or of a frequency dis¬ 
tribution by means of a single value, or average, alone. It is 
generally wise to exhibit the whole distribution in tabular form, 
and often to show it graphically as well. Mention of the range of 
the values, t.e., the highest and lowest values in the series and the 
difference between them, is desirable. It is also important to 
accompany the average with some measure of variation or dis¬ 
persion. The purpose of a measure of dispersion is to show the 
extent to which the individual items in a series vary from their 
average. If the average value of the items is known, and also 
the amount by which a certain proportion of the items deviate 
from that average, a rather satisfactory idea of the distribution 
may be conveyed. For example, note the ungrouped items 4, 1, 

6, 7, 3, 9, 2, 1, 3, 4, representing the number of years between 
marriage and divorce in the case of 10 divorced couples. Their 
mean is 4 years. Six out of the 10 cases do not differ from the 
mean by more than 2 years. If, therefore, we describe the 
distribution to the reader by saying that the mean time between 
marriage and divorce is 4 years, and that three-fifths of the cases 
do not deviate from the mean by more than 2 years, he should 
have a better notion of the distribution than if we merely told 
him to imagine 10 couples whose mean time between marriage 
and divorce was 4 years. 

2. The Average Deviation. —The simplest of the measures of 
dispersion is obtained by finding the amount by which each item 
deviates from the average value, adding these without regard to 
sign, and dividing the sum by the number of items, to obtain 
the average amount of deviation. Such a measure of deviation 
or dispersion is appropriately called the average deviation, and 
is often represented by the symbol A,D, 

In the case of ungrouped data, like the above series, 4, 1, 6, 

7, 3, 9, 2, 1, 3, 4, representing the number of years between mar- 

122 



MEASURES OF DEVIATION AND PARTITION 


123 


riage and divorce for 10 divorced couples, the average deviation 
from the mean value of 4 years is found as shown in Table 33. 

Table 33.— Computations for the Mean Deviation, Ungrouped Data 

X - M,* - X 
4-4=0 

1 - 4 = -3 

6 - 4 = -f 2 

7 - 4 = +3 
3 - 4 = -1 

9 - 4 = 4-5 

2 - 4 = -2 
1 - 4 = -3 

3 - 4 = -1 

4 - 4 = 0 

10 

XW = 20t 

Mz indicates the mean of the X values. 

10 

t The lines | | indicate that signs are disregarded. means to add the 10 items. 

If we add the values of x with respect for the signs, the result is 
zero. Disregarding signs, however, the total is 20, and 

A.D. = n = 2. 

That is, the 10 values differ on the average from their mean by 2 
years. 

A formula for use with grouped data is 

A.D. - . ^ 1 . PI) 

where / is the frequency in any class interval, X is the value or 
mid-point corresponding to a given frequency, Av is the average 
used (mean, median, or mode—usually the mean), x = X — Av, 
and N is the number of items or the sum of the frequencies (/). 
The calculation of the A.D. from the mean, Af, is illustrated in 
Table 34. In the table, the x^s are obtained, of course, by 
subtracting the value of the mean, 0.67, from each of the X 
values. 

There are short methods of finding the average deviation from 
the meant or median, but they are rather cumbersome and will 
not be described here.^ 

^ See, for example, H. Sorenson, Statistics for Students of Psychology and 
Education^ p. 137, McGraw-Hill Book Company, Inc., New York, 1036. 



124 


ELEMENTARY SOCIAL STATISTICS 


Table 34. —Number of Previous Arrests Recorded for 100 Murderers 



The average deviation is usually smaller when taken from the 
median than when taken from the mean or the mode. 

3. The Standard Deviation. —Because the average deviation 
disregards negative signs, another measure of dispersion, known 
as the standard deviation, has been devised, which is free from 
this objection. It is found by subtracting each X value, or 
in grouped data each mid-point value, from the mean of the X 
values, squaring these differences to make all signs positive, 
multiplying them by their respective frequencies, summing them, 
dividing by the sums of the frequencies, and extracting the 
square root. The formula for the standard deviation is, there¬ 
fore, 



Letting z = X — Mx, 



* The Greek letter, small sigma, <r, is conventionally used to represent the 
standard deviation. 











MEASURES OF DEVIATION AND PARTITION 


125 


where a is the standard deviation of the X values, X is the value 
of an item or the value of the mid-point of a group of items, / is 
the frequency of the items in a group or class interval (for 
ungrouped data, / = 1), and N is the number of items, i.e., 
N = 2/. 

To save labor in computing the standard deviation for a large 
frequency table, a short method is commonly used: 



where d is the deviation of the mid-points from a guessed mean in 
class interval units, i = width of class interval. This formula 
may also be modified for use with ungrouped data by taking the 
assumed^ mean at zero, so that d = X, / = 1, and z = 1: 

72XV 
or 


(27) 

(28) 


^ Derivation of formula (26): 
By definition, 


a 



From Chap. VII, formulas (10) and (13), 


X 

M 


A + id 
A + ^ 


Substituting from (o) and (6) in (23) 


<r = > 



" “ A 


"-T)’ 

-“A 



<r » t 



<r ^ i 


-(^7 


(23) 

(а) 

( б ) 


(c) 


( 26 ) 


« See Chap. VII. 



126 


ELEMENTARY SOCIAL STATISTICS 


In Chap. VII, we had the ungrouped data, 3, 7, 2, 12, 1, 16, 4, 
representing the numbers of children in seven Italian immigrant 
families. The mean number of children per family was found 
to be 6.43. What is the standard deviation? If we use the 
long method of formula (22) or (24) above, we require the com¬ 
putations shown in Table 35. 

Table 35.—Computations for the Standard Deviation, Unqrouped 

Data 

(Long method) 


X 

X - M, 

{X - M.y 

3 

3-6.43 = -3.43 

(-3.43)*= 11.76 

7 

7-6.43 - +0.57 

( 0.57)* = 0.32 

2 

2-6.43 = -4.43 

(-4.43)* = 19.62 

12 

12-6.43 = +5.57 

( .5.57)* = 31.02 

1 

1-6.43 = -5.43 

(-5.43)* = 29.49 

16 

16-6.43 = +9.57 

( 9.57)* = 91.58 

4 

4-6.43 = -2.43 

(-2.43)* = 5.90 

Total. 


189.69 





Substituting in formula (22), 

, . . 5.21. 

For the short method of formula (27) or (28), we need only the 
two totals, as shown in Table 36. 

Table 36.—Computations for the Standard Deviation, Ungroupbd 

Data 

(Short method) 


X 

X* 

3 

9 

7 

49 

2 

4 

12 

144 

1 

1 

16 

256 

4 

_16 

45 

479 


Substituting in formula (27), 
(T = 5.21, 




MEASURES OF DEVIATION AND PARTITION 


127 


as before. The saving of labor in comparison with the first 
method is evident. 

Let us next find the standard deviation of Table 34 above, 
and compare it with the average deviation previously obtained 
for the same table. We shall again first employ the long method, 
to clarify the meaning of the arithmetic, and to enable the 
student to compare the amount of work required relative to the 
short method to follow. The formula that describes the long 
method for grouped data is formula (23) or (25), which calls for 
the computations shown in Table 37. The mean of the table is 
0.67. 


Table 37.— Computation of Standard Deviation for Table 34 
(Long method) 


X 

f 

X - = X 


f(X - Af.)* = /x» 

0 

60 

-0.67 

-40.20 

26.93 

1 


-fO.33 

+ 6.60 

2.18 

2 

15 

+ 1.33 

+ 19.95 

26.53 

3 

3 

+2.33 

+ 6.99 

16.29 

4 

2 

+3.33 

+ 6.66 

22.18 

Total. 

100 


0.00 

94.11 


Substituting in formula (23) or (25), 

MJ\_ 

“ V 100 “ 

Turning now to the short method of formula (26), the steps are 
worked out in Table 38. Notice the so-called Charlier check 


Table 38.— Computation of Standard Deviation for Table 34 
(Short method) 


X 

f 

d 

fd 

fd’ 

fid + i)» 

0 

mM 

-1 

-60 

60 

0 

1 


0 

0 


20 

2 

15 

+1 

+ 15 

15 

60 

3 - 

3 

+2 

+ 6 

12 

27 

4 

2 

+3 

+ 6 

18 

32 

Total. 

100 


-33 

105 

139 











128 


ELEMENTARY SOCIAL STATISTICS 


included in Table 38: 2 /+ 2S/d + = 2 /(d + 1 ) 2 ^ or 

100 + 2( —33) + 105 = 139, which is the total of the last 
column of the table. This checks all of the work of the table. 
Substitution in formula (26) now gives 



<r = 0.97, 


which is the value reached by the long method.^ 

The average deviation of Table 34 was found in Sec. 2 above to 
be 0.804, while we see that the standard deviation is 0.97. The 
standard deviation is always larger than the average deviation, 
because squaring the differences gives greater weight to the 
extreme values. 

Because of the inaccuracies due to grouping data in class 
intervals, the standard deviation squared, called the variance, 
of a distribution that is fairly symmetricaP in form is often cor¬ 
rected by subtracting from it the value f*/12 in the case of a 


continuous variable, or 




in the case of a discrete 


variable. In the above problem the variable is discrete, so 
that we have (0.97)^ — (iV "" iV) = (0.97)^ and o- remains 
unchanged. There is no error of grouping when the variable is 
discrete and i == 1. This correction is known as Sheppard^s 
correction. In its usual form it cannot be applied to very skewed 
or asymmetrical distributions. 

If we have calculated the standard deviation of each of two 
series, and then wish to know the standard deviation of the two 
series combined, the latter may be found from the formula 


^4 


riVi(<rP + MP) + N2{(Ti^ + M2^) 


N 


- M\ 


(29) 


where the subscripts differentiate the two series, and no sub¬ 
script indicates the combined series. Where there are more than 


^ In Table 38, it happens that the X values are already in unit step devia¬ 
tion form—0, 1, 2, etc.—so that very little labor is saved by using the d 
column. We might, therefore, have used X in place of d in formula (26). 
The student is asked to do this as a check on the calculations in Table 38. 

* The distribution should be normal in form. See Chap. IX. 



MEASURES OF DEVIATION AND PARTITION 


129 


two series, a term Ni{<Ti^ + Mi^) is inserted in the formula for 
each additional series. 

For example, for Table 34 we have Ni = 100, <ri^ = 0.94, 
and = 0.45. In a second sample of the same kind, given 
N 2 = 80, 0 * 2 ^ = .6302, and M 2 ^ = 4.56. From formula (29), for 
the two samples combined, we find 



/100(.94 + .45) + 80(.6302 + 4.56) 

V 180 

1.16. 


Just as the average deviation is usually a minimum when 
taken from the median, so the standard deviation is a minimum 
when taken from the mean. In fact, the standard deviation is 
practically never taken from any average except the mean, and 
formulas (27) and (28), above, are valid only for the mean. 

4. Effect of Coding^ on Averages and Measures of Dispersion. 
If the frequencies in a frequency table are divided through by a 
constant, fc, the averages and measures of dispersion or partition 
calculated from the table will not be changed. Since it is 
possible to simplify the computation in this way, it is desirable 
to use this device whenever the opportunity offers. 

The student is asked to test this for himself, using Table 39, 
in calculating the mean and the standard deviation. 


Table 39.—Mean Annual Income of 500 Clerical Workers 


Mean Income 

Families 

iX) 

(/) 

$ 500 

25 

1,000 

150 

1,500 

200 

2,000 

75 

2,500 

50 

Total. 

. 600 


It is also often convenient to reduce the absolute frequencies to 
percentage frequencies before using them in computation. 

6. The Coefficient of Variation.—The average or standard 
deviations of two frequency distributions are not directly com¬ 
parable, because they depend upon the size of the mean or 
median iq each case, and upon the particular unit used. For 
example, the weights of a herd of elephants may vary on the 


^ Dividing the frequencies of a distribution by a constant. 




130 


ELEMENTARY SOCIAL STATISTICS 


average by 280 lb., while the weights of a litter of mice may differ 
by 0.1 oz. Yet the mice may show a greater variation than the 
elephants relative to their mean weights. Average and standard 
deviations may, therefore, be made comparable by expressing 
them as percentages of their means or medians. This percentage 
is called the coefficient of variation in terms of the average or 
standard deviation, and is written 


or 


and 


V = 

V = 

V = 


lOOA.D. 
M ' 

Md ' 

100a 

M * 


(30) 

(31)* 

(32) 


It is possible to use the coefficient of variation, V, as a measure 
of the representativeness of an average. It may be said, arbi¬ 
trarily, that when V is above 50 per cent, it is usually advisable 
to abandon the use of an average as a single value intended to give 
an idea of the central tendency of a series. The V calculated for 
the mean of Table 34 above by formula (30) is 


_ 100(0.804) 
^ 0.67 


120 per cent. 


In this case, F is 70 points above 50 per cent; hence the mean is 
obviously a poor device for representing the actual values in this 
very skewed or J-shaped distribution. If we apply formula (30) 
to the mean of Table 40, below, which is merely a rearrangement 

Table 40.— Previous Arrests Recorded for 100 Murderers 
(Frequencies of Table 36 rearranged) 


X f 

0 2 

1 20 

2 60 

3 15 

4 _3 

Total. 100 


M = 1.97, 

A,D. = 0.467. 

* Only one of these formulas should be used in the same comparison. 




MEASURES OF DEVIATION AND PARTITION 


131 


of the frequencies of Table 34 in more symmetrical form for 
purposes of illustration, we find that V = 100(0.467)/1.97 = 24 
per cent, indicating that the mean represents the values in this 
table very well. This result would be expected from an inspection 
of the distribution, which appears to be fairly symmetrical in form, 
with the largest frequency in the center. 

In using formulas (30), (31), and (32), it will be seen that if 
two distributions have equal average or standard deviations, but 
unequal means or medians, the one with the larger average will 
have the smaller coefficient of variation, V. This is as it should 
be, provided that the means or medians used in finding the F^s 
contain no element that spuriously raises or lowers the values 
from which the averages are calculated. 

Suppose that the question is asked, Does Table 34 or Table 40 
show a greater amount of variability from the mean? In the 
case of Table 34 it has been seen that V = 120 per cent, and for 
Table 40 it was found that V = 24. The F^s, therefore, show 
that Table 34 is -/ir = 5 times as variable as Table 40, whereas 
the average deviations would indicate that the former distribution 
was less than twice as variable as the latter. 

6. Partition Values.—To show the scale values below which 
any desired proportion of the frequencies in a distribution fall, a 
set of partition values known as quartiles, deciles, etc., or more 
inclusively as 'percentiles, has been devised. These measures 
all employ the principle of the median, and apply primarily to 
grouped data. Thus, while the median is that scale value below 
which half of the values fall, the first quartile, Qi, is the scale 
value below which lie one-fourth of the values; the third quartile, 
Qz, is the scale value below which lie three-fourths of the values; 
the ninth decile, dg, is the scale value below which lie 90 per cent 
of the values; the 65th percentile is the scale value below which 
lie 65 per cent of the values; and so on. It is, therefore, seen 
that each of these measures is merely a particular percentile 
value, the median corresponding to the 50th percentile, the first 
quartile to the 25th percentile, the third quartile to the 75th 
percentile. The general method of finding any value is the same. 

Because of logical difficulties, it is seldom that any partition 
value except the median is found for ungrouped data.^ 

^ If the attempt must be made, however, it is generally best to accept 
rough approximations, rather than insist on exact but imaginary interpola- 



132 


ELEMENTARY SOCIAL STATISTICS 


For grouped data, it will be recalled that the median is located 
by dividing the total frequency, iV, by 2, counting up the column 
of accumulated frequencies of the table until the lower limit of 
the class interval is reached which contains the median value, 
and then interpolating within this interval to determine the 
median value. When finding any percentile value other than 
the median, we need only change the coefficient of the total 


tions. For example, if we are required to furnish the third quartile, Qa, 
for the array of 12 ages—3, 5, 6, 9, 11, 16, 20, 21, 24, 25, 26, and 30 years— 
we may find the position 12 X 0.75 = 9, and say that 100 (/j) = 75 per cent 
of the ages are less than the age of 25 years that occupies 9 + 1 = 10th place 
in the array. This statement is correct in the present case; but it is not 
correct to say, further, that 100 — 75 = 25 per cent of the ages are greater 
than 25 years. If the age 24 years in the array were replaced by a second 
age 25 years, then the age 25 years would no longer be greater than 75 per 
cent of the ages, but it would still probably be the most appropriate age to 
offer as an approximate value for Qi. 

When the position, iVp, found by multiplying the total number of items, 
W, by the given percentage value, p, is not a whole number, the matter is 
more complicated. Thus, if we drop the age 30 years from the top of the 
above array, we have Np = 11 X .75 = 8.25. There is no 8.25th position 
in this array, so we have to choose between positions number 8 and 9, or 
else interpolate between them. If we take position 8 as the nearest integer, 
and add one to it, as we did above, we get position 9. The age correspond¬ 
ing to this position is 24 years, and we see that eight ages, or 100(x\) = 72.7 
per cent of the ages, are less than this age. Since 72.7 per cent is rather 
close to 75 per cent, the age 24 years seems to be the simplest approximate 
value to assign to Qa. 

Only when no actual position in an array gives a reasonably close approx¬ 
imation to the meaning of a required percentile is it usually worth while to 
interpolate between two positions. If our array above consisted of only the 
first 10 ages, to find Qa we would have pN = 0.75(10) = 7.5. The age in 
the eighth position is greater than 100 (jiy) = 70 per cent of all the ages, 
whereas that in the ninth position is greater than 100(A) = 80 per cent 
of the ages. Here we might prefer to take the interpolated position, 
7 4* 8 

—s— =* 7.5, so that, assuming continuous or grouped data, the theoretical 


age corresponding to it would be greater than 100(7.5/10) = 75 per cent 
of the ages in the array. This theoretical age, or value of Qa, must be 
halfway between age 20 in seventh position and age 21 in eighth position, or 


20 + 21 
2 


20.5 years. 


Notice that, in ungrouped data, the empirical formula, Np + 1, used 
for locating the approximate integral position of such a partition value as 
Oi, is replaced by the formula p{N + 1) for determining the median position. 




MEASURES OF DEVIATION AND PARTITION 


133 


frequency, N, For example, in the case of Qi, we use N/A, for 
Qa, SN/Af for dg, O-OiV, for the 65th percentile, 0.65iV’, and so on. 
The general formula, using P to represent any percentile, median, 
decile, or quartile value on the X scale, is 

P = L + (33) 

where p is the percentile rank or point of division on the frequency 
scale expressed in percentage form {e.g,, p = 0.75), L is the lower 
limit of the interval containing the pth value, N is the total 
frequency of the table, F is the sum of the frequencies falling 
below {i.e.j in class intervals with limits smaller than) L, / is 
the number of frequencies in the interval containing the pth 
value, and i is the size of interval containing the pth value. 

Let us find the values of Qi, O 3 , dy, and pss (33rd percentile) 
in Table 41. 


Table 41.— Distribution of the Estimated Income among Unmarried 
Women of the United States in 1910* 


Income 

iX) 

Women 

(K) 

(K) 

Accumulated 

Percentage 

accumulated 

$ 100- 199 

10 

10 

0.55 

200- 299 

70 

80 

4.42 

300- 399 

560 

640 

35.36 

400- 499 

530 

1,170 

64.64 

500- 599 

280 

1,450 

80.11 

600- 699 

150 

1,600 

88.40 

700- 799 

110 

1,710 

94.48 

800- 899 

37 

1,747 

96.52 

900- 999 

22 

1,769 

97.73 

1,000-1,099 

16 

1,785 

98.62 

1,100-1,199 

12 

1,797 1 

99.28 

1,200-1,299 

8 

1,805 

99.72 

1,300-1,399 
Total. 

5 

1,810 

1,810 

100.00 


* From W. I. Kino, Wealth and Income of the People of the United States, p. 224, 1915. 


To find Qi, we have 

pN = .25(1,810) = 452.5. 

Counting up (f.e., in the direction of increasing values on the X 
scale) the accumulated frequency column of the table, we see that 






134 


ELEMENTARY SOCIAL STATISTICS 


452.5 lies in the class interval 300-399. Therefore, 

L = 300. 

F = 80. 

/ = 560. 
i = 100. 


Substituting in equation (33), 


Qi 

'Qi 


300 + 
366.5. 


452.5 - 80 
560 


• 100 , 


That is, one-fourth of the women earned less than $366.50 a year. 
Similarly, 


Q, = 500 -f- 


1,357.5 - 1,170 
*280 


• 100 , 


or 


Qs = 567, 

dr = 500 + ■ 100, 


or 


or 


d7 = 534.6, 
Pzz = 300 + 


597.3 - 80 
560 


• 100 , 


Pzz = 392.4. 


From these results we notice that three-fourths of the working 
women made less than $567 annually, 70 per cent of them made 
below $534.60, and one-third made under $392.40. Of course, 
there is no point in calculating all of these values except for 
illustrative purposes. We are usually interested in such fractions 
as one-third, one-half, or three-fourths. 

An investigator often requires, not the value below which a 
certain percentage of the frequencies fall, but the reverse of this, 
namely, the percentage of the cases that falls below a certain 
value, that is, the percentile rank of the value. Referring back 
to the ungrouped array of 11 ages used above, viz,, 3, 5, 6, 9, 11, 
16,20,21, 24,25, and 26 years, we may require the percentile rank 
of the person aged 21. Since, by definition, this is equivalent to 
asking what percentage of the persons in the array are less than 



MEASURES OF DEVIATION AND PARTITION 


135 


21 years of age, we note that there are 7 persons out of 11 who 
are younger than 21 years, and compute = 0.636, or 63.6 per 
cent. We then say that the percentile rank of the person aged 
21 is approximately 64. 

Turning to grouped data, suppose we ask what proportion of 
the unmarried women of Table 41 earned less than some minimum 
living wage, say $550 a year. Our problem now is, knowing a 
value on the X scale, to find the percentage of values on the V 
scale that fall below it. In the present case, it is evident that 
1,170 women earned less than $500, and that 280 earned between 

$500 and $599. We have (280) = 140, as the 


number of women earning between $500 and $550. Therefore 
1,170 + 140 = 1,310 is the number of women who made less 
than $550. Expressed as a percentage of the total number of 


women workers, we find that 100 



= 72 per cent of the 


women failed to earn as much as the minimum amount. A 
formula for this calculation is 


where p is the percentile rank sought, F is the given X scale value^ 
F is the accumulated frequencies in the class intervals with 
limits smaller than those of the interval including F, f is the 
frequency of the interval including P, L is the lower limit of this 
same interval, i is its width, and N is the total frequency of the 
table. Thus, substituting the values of the preceding problem 
in formula (34), we get 



1,170 + 


280(550 ~ 500) 
100 


100 

1,810' 


or p = 72 per cent, as before. 

An X scale value corresponding to a given accumulated fre¬ 
quency, or a percentage frequency corresponding to a given X 
scale value, may also readily be found by means of a cumulative 
curve, which was described in Fig. 11, Chap. VI. The student is 
asked to use this device to check the arithmetical results just 
obtained from Table 41 above, preferably plotting the curve from 
the last column of that table. 



136 


ELEMENTARY SOCIAL STATISTICS 


A measure known as the quartile deviation is sometimes used. 
The formula is 

Q = (35) 

Thus, for Table 41, 

Q = = 100.25. 

The quartile deviation is employed only when the median is the 
preferred average. 

All these measures of dispersion—quartiles, deciles, percentiles, 

quartile deviation—are so-call- 
ed position values, and have the 
y I ^ In. same advantages and disadvan- 

r tages as the median, previously 

Qg Md. Qj discussed. In particular, they 

Fia. 37.—The distance Qa-Qi in- are insensitive to extreme 
eludes half of the cases. , , x v x x j 

values, and cannot be treated 
algebraically. They are especially useful in analyzing a skewed 
frequency distribution, since they maintain a definite relationship 
to the distribution, regardless of its shape. 

7. Comparable Measures or Scores.—When two frequency 
distributions are of about the same shape, e.p., both about 
symmetrical, both slightly skewed in the same direction, both 
J-shaped, etc., distances on their scales are usually compared 
in units of their respective standard deviations. Thus, if we 
have the distributions of many scores on two independent tests 
of a given trait, for each test the deviations of the scores from 
the true^ mean are divided by the true standard deviation, 
to get the desired standard scores. Given, for Test I, true 
mean = 70, true <r = 10; and for Test II, true mean = 62, 
true <r = 12. If subject A scored 80 on Test I and 60 on Test II, 


his standard score on Test I is 


80 - 70 


= 1, and on Test II is 


60 - 62 


= —0.17; and his combined score on the two tests is 


1 + ( — .17) = 0.83. If subject B scored 75 on Test I and 65 


on Test II, his corresponding standard scores are 


= 0.50 


^ By triie is meant a statistic derived from many applications of a test, 
rather than from a single application, to the same universe or type of subjects. 



MEASURES OF DEVIATION AND PARTITION 


137 


on Test I, and 


65 - 62 
12 


0.25 on Test II; and his combined 


score is 0.75. 

Where two distributions differ markedly in form, e.g,, one being 
about symmetrical and the other J-shaped, or one very peaked 
and the other flat, the standard deviations do not provide con¬ 
sistent units for reducing their scale distances to more comparable 
terms, because the proportion of frequencies included between 
the mean and one standard deviation on each side of it changes 
with the form of the distribution. Theoretically, perhaps the 
best procedure under these circumstances is to normalize both 
distributions, but the method is too complex to introduce here.^ 
A cruder but much simpler method uses the Q^s instead of the tr^s 
as common denominators. Although Q also has disadvantages, 
it is one-half of the range Q 3 -Q 1 , within which always falls 
the middle half of the frequencies; and in that sense its interpre¬ 
tation is independent of the shape of the distribution (see Fig. 37). 

Suppose now that the distribution of many scores in Tests I 
and II above are quite different, being J-shaped to the left 
in Test I and skewed to the right in Test II. For Test I the 
true median score is 74, and the true Q value is 6 ; for Test II, 
the median score is 59 and Q is 8 . We divide the deviations 
of the two subjects^ scores from the medians by the respective 


Q values, and get 
score of subject A, and 


+ 


60 - 59 


8 

75 - 74 
6 


+ 


= 1.125 as the combined 

65 — 59 n ^ XT. 
- 5 -= 0.917 as the 


combined score of subject B. These may be called the Q scores. 

Instead of the standard scores or Q scores described above, the 
method of equivalent percentile scores may be used in the effort to 
make two independent scales comparable. For each scale, every 
percentile or, say, every fifth percentile is found, and these values 
are arranged in two parallel series, where corresponding pairs of 
values are regarded as equivalent. Thus, in Table 42 below, the 
values Xi = 13 and X 2 = 0.1, are equivalent on the two scales. 
The percentile values are found arithmetically from the two 
given frequency distributions of scale values by formula (33) 


^ Paul Horst, Obtaining Comparable Scores from Distributions of Dis¬ 
similar Shape, Journal of the American Statistical Association^ Vol. 26, pp. 
455-460, 1931. 



138 


ELEMENTARY SOCIAL STATISTICS 


above, or graphically from the ogive curve as illustrated in Fig. 
11, Chap. VI. Suppose that we wish to compare the score of 
subject 114 on Test Xi, 85, with the score of subject 17 on Test 
X 2 , 2.6. From Table 42 we see that a score of 2.6 on scale X 2 
is equivalent to a score of 93 on scale Xi. Hence the two 
comparable scores are 85 and 93. If either or both of the scores 
of subjects 114 and 17 did not appear in Table 42, we would 
find the percentile rank of say the second of them by formula 
(34) above, and then, using this in formula (33), find the cor¬ 
responding value on the Xi scale. This equivalent Xi value 
would then be compared with the Xi score of the other subject. 

Table 42, —Two Series op Equivalent Percentile Scale Values: 

Attitude toward War 


n 

Scale, X\ 

Scale, Xt 

{p.y 

(Pn) 

5 

13 

.1 

10 

23 

.4 

15 

32 

.5 

20 

41 

.6 

25 

49 

.65 

30 

56 

.8 

35 

63 

1.0 

40 

69 

1.2 

45 

75 

1.4 

50 

80 

1.6 

55 

85 

1.9 

60 

89 

2.2 

65 

93 

2.6 

70 

95 

2.9 

75 

97 

3.2 

80 

97.5 

3.6 

85 

98 

3.9 

90 

98.5 

4.2 

95 

99 

4.6 

100 

100 

5.0 


* nth peroeutXi^ doale value. 

The above three methods are not applicable to ungrouped or 
scanty data. 

When the data are inadequate, or when for other reasons we 
have more confidence in the ability of two scales to arrange items 



MEASURES OF DEVIATION AND PARTITION 139 

in rank order than to measure distances between them, simple 
percentile ranks may be used for purposes of comparison. Given 
the scores on a test, the percentile rank is found for each score. 
For example, if 62 per cent of the scores made on a test are 
less than the score 80, the percentile rank of the latter is 62. 
For ungrouped data, the percentile ranks are found by the 
informal method outlined on page 132; for grouped data, the 
percentile ranks are obtained arithmetically from formula (34), 
above, or graphically from an ogive. The weakness of percentile 
ranks is, of course, that they do not reflect the distances between 
the scores on any scale. Thus, the score 70 may have a per¬ 
centile rank of 50, the score 77 a percentile rank of 60, and the 
score 85 a percentile rank of 90, so that the successive scores 
stand in the ratio of 1:1.1, whereas the corresponding successive 
percentile ranks bear the ratios 1:1.2 and 1:1.5, respectively. 
For this reason, the difference between percentile ranks should 
not be interpreted as proportional to the distance between the 
corresponding scale values. 

As a matter of fact, there is usually no feasible method of 
treating scores obtained from the lise of very different kinds of 
scales that makes them strictly comparable. 

Exercises 

1. Compare the average deviation and the standard deviation of the 
series below. Find the standard deviation by formulas (24) and (28) 
as a check. 


Number op Dependents in 25 Families on Relief 


Family no. 

Dependents 

Family no. 

Dependents 

1 

3 

14 

5 

2 

5 

15 

3 

3 

4 

16 

3 

4 

1 

17 

2 

5 

6 

18 

4 

6 

8 

19 

1 

7 

2 

20 

3 

8 

3 

21 

4 

9 

3 

22 

3 

10 . 

2 

23 

6 

11 

4 

24 

2 

12 

1 

25 

3 

13 

2 1 






140 


ELEMENTARY SOCIAL STATISTICS 


2 . Compare the average deviation and the standard deviation of the 
following frequency distribution, using for the standard deviation 
formula (26) with the Charlier check: 

Semester Hours op Mathematics Taken by 67 Students in a Class op 
Elementary Social Statistics 


Semester Hours 

Studei 

43.5-46.4 

1 

40.5-43.4 

0 

37.5-40.4 

0 

34.5-37.4 

0 

31.5-34.4 

0 

28.5-31.4 

2 

25.5-28.4 

2 

22.5-25.4 

5 

19.5-22.4 

4 

16.5-19.4 

8 

13.5-16.4 

13 

10.5-13.4 

26 

7.5-10.4 

4 

4.5- 7.4 

1 

Total. 

. 66 


3. Use the coefficient of variation, F, to measure the representative¬ 
ness of the mean of the distribution in Exercise 2, above. 

4 . Below are two random samples of family incomes in a certain city, 
one taken in 1928, the other in 1932. Did the depression reduce or 
increase the spread in income between families? 


Income 

j Number of families 

1928 

1932 

Under $500. 

5 

76 

500-999. 

15 

123 

1,000-1,499. 

115 

155 

1^500-1^999. 


91 

2^000-2,499. 

82 


2,500-2,999. 

63 

52 

8,000-3,499. 

27 

17 

3,500-3,999. 

19 

12 

4,000-4,499. 


7 

4,500-4,999. 

6 

3 

6,000-5,499. 

3 

1 



Total. 

535 

607 























MEASURES OF DEVIATION AND PARTITION 


141 


6. Using the standard deviations found for the 1928 and 1932 series 
in Exercise 4, compute the standard deviation for the two series 
combined. 

6. The table below shows the number of children who required the 
specified numbers of hours of social contact before they were ‘^accepted*’ 
in a certain play group, (a) What percentage of the children took less 
than 4 hours? (6) What percentage of the children took more than 
10 hours? (c) How many hours did three-fourths of the children 
require less than ? (d) How many hours did three-fourths of the children 

require more than? 


Hours 
18-19 
1&-17 
14-15 
12-13 
10-11 
8- 9 
6- 7 
4- 5 
2- 3 
0 - 1 


Children 

1 

3 

2 

6 

10 

9 

8 

6 

3 

2 


Total 


60 


7. Given two independent scales, X and F, for the measurement of 
“cooperation^^ between members of a random sample of urban families. 
Family A has a score of +1.2 on scale X, family B has a score of 86 on 
scale Y. Reduce these scores to as nearly comparable terms as you can. 


Scale X 

Families 

Scale Y 

Families 

-2.5—2.9 

4 

0- 9 

21 

-2.0-2.4 

12 

10-19 

68 

-1.5--1.9 

22 

20-29 


-1.0-1.4 

45 

30-39 

140 

-0.5—0.9 

71 

40-49 

131 

0.0--0.4 

89 

50-59 

91 

0.O-+0.4 

116 

60-69 

74 

+0.5-+0.9 

132 

70-79 

66 

+1.0-+1.4 

151 

80-89 

28 

+1.5-+1.9 

93 

90-99 

13 

+2.0-+2.4 

60 

Total. 

731 

+2.5-+2.9 

17 



Total. 

812 













142 


ELEMENTARY SOCIAL STATISTICS 


8. Suppose that the frequencies for the Y scale in Exercise 7, are 

reversed end for end of the scale, while those for the X scale remain as 

they are. Convert these scores to a more comparable basis. 

References 

Chaddock, R. E. : Principles and Methods of Statistics, Chap. IX, Houghton 
Mifflin Company, Boston, 1925. 

Cboxton, F. E., and D. J. Cowden: Applied General Statistics, Chap. X, 
Prenticc-Hall, Inc., New York, 1939. 

Davies, G. R., and Dale Yoder; Business Statistics, Chap. II, John Wiley 
& ^ns, Inc., New York, 1937. 

Garrett, H. E.: Statistics in Psychology and Education, Chap. 1, Longmans, 
Green & Company, New York, 1926. 

Kelley, T. L.: Statistical Method, Chap. VI, The Macmillan Company, 
New York, 1923. 

Lindquist, E. F.: A First Course in Statistics, Chap. IX, Houghton Mifflin 
Company, Boston, 1938. 

Mills, F. C.: Statistical Methods, rev. cd., Chap. V, Henry Holt and Com¬ 
pany, Inc., New York, 1938. 

Sorenson, H.; Statistics for Students of Psychology and Education, Chaps. 
VII and VIII, McGraw-Hill Book Company, Inc., New York, 1936. 

White, R. C.: Social Statistics, Chap. IX, Harper & Brothers, New York, 
1933. 



CHAPTER IX 


COMBINATION, PROBABILITY, AND THE NORMAL 
DISTRIBUTION 


1. Permutations and Combinations.^ —It is often desirable in 
sociological investigations to know the total number of ways in 
which a certain event can occur. For example, in a study of 
intercity migration among five cities, how many paths can the 
migration take? Or, among 10 girls in a boarding school, three 
two-girl friendships are found. How many such friendships are 
possible in this group? The same kind of problem arises in 
connection with the binomial formula, discussed in Sec. 3 below. 

To answer the question about the paths of migration, we 
notice that since a migrant may go from any of the five cities 
to any of the four remaining cities, the number of paths must be 
5X4 = 20. Not only do we count each pair of cities, but also 
the two orders or arrangements in which the members of a pair 
may be taken, as ‘^from a to 5,^^ and ‘‘from h to A pair of 
cities in a given order, e.g., “from a to 5,” is called a permutation^ 
and the general formula provided by algebra for finding the 
number of permutations of n things taken r at a time is 


nPr = 


n! 

(n — r)! 


( 36)2 


For the problem above, we substitute in the formula, and get 

5! (5X4X3!) 

” (5 - 2)! “ 3! 

= 5 X 4 = 20, 

as before. 

Formula (36) is based on Theorem 1. 

^ For a fuller treatment of this subject, see any text in college algebra, 
c.g., H. B. Fine, College Algebra, Chap. XXV, Ginn and Company, Boston, 
1904. 

2nl is called factorial,” and means the product of all consecutive 
numbers from 1 through n. For example, 41=4X3X2X1 =24. 

143 



144 


ELEMENTARY SOCIAL STATISTICS 


Theorem 1. If an event A can occur in m ways, and thereafter 
an event B can occur in n ways, A and B can occur together in the 
order named in mn ways. 

A first approach to the problem of the boarding school friend¬ 
ships mentioned above can also be made by means of formula 
(36). The number of arrangements, or permutations, of 10 girls 
taken two at a time is 

loPj = = 10 X 9 = 90 


Here, however, there is no interest in the order of the girls in a 
two-girl friendship. When this is the case, i.e., when a group 
of things is taken without regard for the arrangement of the 
members, the group is called a combination. Evidently, each 
pair of girls can be arranged in two orders or permutations, so 
that the 90 permutations found above reduce to ^ = 46 combina¬ 
tions. The formula for combinations is, therefore. 


^ _ ^1 

“ TT “ r\{n - r)f 


Using it, we get again 


loC* 


10! 10 X 9 X 8! 

“2!8! 218! 

_ 10 X 9 _ 90 _ - 
2X1 2 


(37) 


Although formulas (36) and (37) apply to a large number of 
problems, some problems occur that are best approached inde¬ 
pendently. As an easy example, suppose we ask. What is the total 
number of possible relationships that can exist between two 
persons, X and Y, in terms of attraction, indifference, and 
repulsion? To each of the three attitudes of X, Y may respond 
with* three attitudes, so that, by Theorem 1 above, we have 
3X3 = 9 relationships. These relationships are (1) mutual 
attraction between X and Y; (2) X is attracted by Y, but Y is 
indifferent to X; (3) X is attracted by F, but F is repulsed by 
X; (4) Mutual indifference between X and F; (5) X is indifferent 
to F, but F is attracted by X; (6) X is indifferent to F, but F is 
repulsed by X; (7) mutual repulsion between X and F; (8) X is 
repulsed by F, but F is indifferent to X; and (9) X is repulsed by 
F, but F is attracted by X. 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 145 


2. Probability.—Chance, often called ‘‘luck,^' and the tricks 
it plays are known to everyone. In a hand at cards, one may 
draw no ace; one, two, three, or even all four aces. Whether a 
person is male or female, white or black, European or American, 
is, as far as he is concerned, purely an accident. The occupation 
one follows, the person one marries, the state of one^s health, 
and so on, are also subject to a great amount of chance. Dis¬ 
covery and invention, even the trend in the development of a 
nation’s culture in the sociological sense, depend in part on 
thousands of small forces of which we have no knowledge. 
If the birth rates in a city differ in 1939 and 1940, is it because 
fundamental conditions affecting fertility have changed, or is 
the variation due merely to accidental factors that will cancel 
out over several years? In one random sample of old people 
there may be more male than female survivors and in another 
sample exactly the reverse, regardless of the true proportion 
in the population. It is, therefore, not surprising that any 
careful attempt to investigate social life or culture is obliged 
to reckon with this element of chance. Chance distorts the 
findings of research, and must be allowed for. 

One of the greatest practical contributions of mathematics 
has been its discovery, beneath apparent confusion, of a remark¬ 
able regularity in the occurrence of chance events. By mathe¬ 
matical means, we can estimate the amount of variation due to 
chance and predict the number of occurrences of any event whose 
probability is known, e.g., the annual deaths in a class of insur¬ 
ance risks. On these mathematical laws of probability are 
founded great business enterprises like insurance, as well as the 
basic techniques of a vast amount of scientific and industrial 
research. 

The exact mathematical definition of 'probability is this: If an 
event can succeed in m ways and fail in m' ways, all equally 
likely and mutually exclusive, and the event must either succeed 
or fail, the probability of its succeeding is 

m 

V =-j- 

^ m + m 


and that of its failing is 


(38) 



146 


ELEMENTARY SOCIAL STATISTICS 


That is, 


P + ? 


m + 
m + m' 



( 40 ) 


In other words, since an event must either succeed or fail, the 
probability of certainty is one in one, or unity. 

The proportion of ways in which an event can succeed may be 
determined for practical purposes by one of two methods, or by 
both. In the case of a penny, we decide that the probability of 
throwing a head is by reasoning that the penny has only two 
sides and is equally balanced so that one of them is as likely to 
turn up as the other. This is an illustration of the theoretical or 
a priori method. By the so-called empirical method, the chance 
of death within a year of a white male, aged 30, engaged in a 
clerical occupation, married, and an medical risk, is found 
by simply counting the proportion of annual deaths occurring 
among a very large number of such individuals (say, 354 deaths 
among 85,707 persons, giving a probability of 0.00413). The 
empirical method is sound if the probability tends to approach a 
limit, as the estimate is based on an ever-increasing number of 
cases under essentially the same conditions. In both methods, 
of course, it is supposed that the conditions under which the 
probability was obtained will hold approximately for all situa¬ 
tions to which the probability is applied. For example, if each 
added count of deaths in a risk group like that described above 
causes the average probability of death to approach nearer to 
some figure 0.00400, then 0.00400 may be regarded as an approxi¬ 
mation of the true (expected) proportion that exists in the given 
class as a whole (an infinite universe). But it would obviously 
be wrong to apply this death rate to a class in which the age was 
40 instead of 30 years! 

Two basic theorems of probability are 

Theorem 2. Of two mutually exclusive^ events, A and 
if the event A has a probability of occurring, p, and the event B 
has a probability of occurring, p', the probability that either A or 
B mil occur in one possible way is p -f p\ 


'Two events are mutually exclusive when in a single trial only one of 
them can happen. In a hand at cards, drawing an ace and drawing a jack 
are mutually exclusive events, but drawing an ace and drawing a diamond 
are not, because both may appear on the same card. If the two events are 
not mutually exclusive, the probability is p -bp' — pp'. 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 147 


Theorem 3. If an event A has a probability of occurring, p, and 
an event B has a probability of occurring with or after A in one pos¬ 
sible way, p', the probability that both A and B will so occur is pp\ 
A first application of Theorem 3 may be made to a typical 
problem. A community is inhabited by two groups of different 
nationality and religious backgrounds, Swedish Lutherans and 
German Catholics. Among the Lutherans in the age class 40 to 
45 years are 40 females and 44 males, among the Catholics 62 
females and 58 males, all married to someone included in the 
enumeration. The records show 18 mixed marriages, 11 between 
Lutheran males and Catholic females, and 7 between Catholic 
males and Lutheran females. How does this observation com¬ 
pare with the number of mixed marriages that would be expected 
if there were no prejudice for or against them in the community? 
We set up the totals of Table 43. By the definition on page 145, 
the probability of a marriage occurring in row (1) is in/N = 
and of a marriage occurring in col. (1) is ni/N = By 

Table 43.— Fourfold Table for Determining Probability of Mixed 

Marriages 


Males 


Females 

Lutheran 

Catholic 

Total 


(1) 

(2) 

(3) 

Catholic (1). 

Lutheran (2). 

ifi = 20.7 
zfi = 17.3 

,/2 = 35.3 
2/2 = 22.7 

62 = in 

40 = 2n 

Total (3). 

ni = 44 

n2 = 58 

0 

(0 

li 


Theorem 3, the probability of a marriage occurring in both row 
(1) and column (1) is (in/N)(ni/N) = therefore the 

expected number of Catholic women marrying Lutheran men 
is (in/N)(ni/N)(N) = inni/N = 62(44)/102 = 26.7. This ex¬ 
pected frequency is entered in the proper cell in the table. 
Similarly, the expected frequency in the cell common to row (2) 
and column (2) is 40(58)/102 = 22.7. Thus the total number of 
expected mixed marriages is 26.7 + 22.7 == 49.4, or approxi¬ 
mately 49; whereas, the observed number is 18, only 36 per cent 
of the expected number. Evidently, there are obstacles in the 
way of marriages between the Swedish Lutherans and the German 
Catholics in this community. 








148 


ELEMENTARY SOCIAL STATISTICS 


This conclusion may be more fully established by applying the 
Chi-square (x®) method to Table 44. This method is designed 
to test the hypothesis that the differences between a set of 
observed and expected frequencies may be due solely to chance. 

To obtain we subtract each expected frequency (/«) from 
the corresponding observed frequency (/o), divide the squared 
difference by the expected frequency, and sum these ratios. 
The calculations are shown in Table 44. 


Table 44.—Chi-square (x*) Test 


Females 

Males 

Man 

Ob¬ 

served 

ijo) 

iages 

Theo¬ 

retical 

(/«)* 

U-ft 

(/.-/<)* 

ft 

Catholic. 

Lutheran 

11 

26.7 

-15.7 

246.5 

9.23 

Catholic. 

Catholic 

51 

35.3 

+ 15.7 

246.5 

6.98 

Lutheran .... 

Lutheran 

33 

17.3 

+ 15.7 

246.5 

14.25 

Lutheran .... 

Catholic 

7 

22.7 

-15.7 

246.5 

10.86 







41.32 = X* 


* If any theoretical cell frequency is less than five, a correction is needed. 
See Paul Rider, An Introduction to Modern Statistical Methods^ pp. 112-113, 
John Wiley & Sons, Inc., New York, 1939. 


It was seen above that the expected frequencies used in Table 
44 were calculated from the row and column totals of the observed 
frequencies in Table 43. This means that the observed and 
expected frequencies in the cells of Table 44 were to a certain 
extent made to agree. Evidently, this forced agreement should 
be allowed for in testing the amount of difference between the 
two sets of frequencies. In any 2X2 table, like Table 43, 
it is clear that if the row totals, the column totals, and one 
observed cell frequency are given, the other three cell frequencies 
are at once determined. Therefore, only one cell frequency is 
free of the influence of the marginal totals, so that a 2 X 2 table 
is said to have one degree of freedom.^ If now the value of x^ 
obtained is referred to a table of x*> such as Appendix Table 2, 
that takes account of degrees of freedom, the spurious resem- 

' The degrees of freedom for any contingency table are (c -• 1) (r — 1), 
where c is the number of columns and r is the number of rows. See A. E. 
Treloar, Elements of Statistical Reasoning^ pp. 215 and 229, John Wiley & 













COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 149 


blance between the observed and expected frequencies to which 
we objected above is corrected for. 

Entering Appendix Table 2 with one degree of freedom, then, 
we find that a as large as 6.635 could occur by chance once in 
100 times, the theory involved here being similar to that described 
in the latter part of Sec. 4, below. Our is 41.32, which is 
much larger, and would occur by chance less often than once in 
100 times. Since it is customary to reject chance as the explana¬ 
tion of an event that can happen by chance no oftener than five 
times in 100, we conclude that the frequency of mixed marriages 
in the community cited is reduced by sociological and perhaps 
economic forces. 

The classic method of introducing the elementary notions of 
probability is to use the illustration of coin tossing. The event 
is the occurrence of a “head” or a “tail.” We may toss one 
coin several times, several coins once, or several coins several 
times, as we wish. It is evident that the events are mutually 
exclusive, as specified in Theorem 2, above. We may also 
assume that all the coins tossed during the experiment are 
exactly alike in size, weight, shape, and balance, i.e., in respect 
to every fixed or biased factor that affects the tendency of heads 
or tails to fall uppermost when the coin is tossed. In this 
way we meet the requirement that each event of a probability 
set shall be equally likely. Differently expressed, it is assumed 
that the probability, p, of throwing a head is the same for every 
penny at each throw, and that every penny at each throw is 
independent of every other penny, i.e,, there is no tendency for 
one penny to show heads or tails because another does or does 
not, as would happen if they were stuck together. Finally, 
of the two events that can occur, one, heads, w^e call a success, 
and the other, tails, we call a failure. Having specified these 
conditions, our first question is. What is the probability of 
throwing a head, or of getting a success, at any one toss of a 
penny? In other words, what is the value of p? 

Since in a single toss of one penny there is only one way in 
which a success can occur and one way in which a failure can 

Sons, Inc., New York, 1939. A contingency table is a table of frequencies 
divided according to two or more principles of classification, such as the 
table in Exercise 7 at the end of this chapter. 

**probability set” is described by the denominator of formula (38) 
or (39). 




160 


ELEMENTARY SOCIAL STATISTICS 


occur, and we assume that the pennies are balanced so that these 
two events are equally likely, we have, in the notation introduced 
above, m = m' = 1. Hence, from formulas (38) and (39), 
p — q, and substituting p for q ox q for p in formula (40), we 
find that p = q = ^ = 0.5. 

Suppose that we throw 10 pennies, and want to know the 
probability of getting exactly eight of a kind, i.e., eight heads or 
eight tails. If eight of 10 pennies show heads, then the other 
two must show tails, or vice versa. We just saw that if we throw 
one penny, the probability of getting a head in one throw is 
p = .5. By Theorem 3, above, the probability of eight successes 
occurring in one possible way is p^ = (.5)®, the probability of two 
failures occurring in one possible way is q‘^ = (.5)^, and the 
probability of these two events occurring together in one possible 
way is p^q^ == (.5)®(.5)2. But the eight heads may occur among 
the 10 pennies in several possible ways, so that by Theorem 2 the 
probability of occurrence in just one way should be summed as 
many times as there are possible ways, or, more briefly, multiplied 
by the number of possible ways. How many possible ways are 
there? This is equivalent to asking, In how many ways may we 
get eight heads from 10 pennies, or, how many possible combina¬ 
tions are there of 10 (= n) things taken eight (= r) at a time? 
To answer this, we already have formula (37) above, which for 
our problem gives^ 

10! _ 10(9)(8)(7)(6)(5)(4)(3)(2)(1) _ 

- 218! - (8)(7)(6)(5)(4)(3)(2)(1)(2)(1) ^ 

Hence the probability, P, of getting exactly eight heads in a single 
throw of 10 pennies is 

P = nCrP'q’'-’-, (41) 

or 

P = 45(.6)»(.5)* = 45(.5)‘''. 

Using logarithms,* we find 

log (.5)1® = 10 log .5 = 10(9.69897 - 10) = 96.98970 - 100. 

^ See Appendix Table 3. For extensive table of factorials or their loga¬ 
rithms, see T. C. Fry, Probability and Its Engineering UseSy pp. 427—438, D. 
Van Nostrand Company, Inc., New York, 1928. A briefer table is given in 
Mathematical Tables from Handbook of Chemistry and Physics^ 5th ed., p. 
180, Chemical Rubber Publishing Company, Cleveland. 

.* See Appendix Table 7 and accompanying Foreword. 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 161 


The antilogarithm of this is .0009766. Hence 
P = 45(.0009766) = .044. 

That is, in 44 out of 1,000 trials we would expect by the laws 
of chance to get exactly eight heads in a toss of 10 pennies. 
Similarly, by Theorem 2, the probability of getting either eight 
heads or eight tails is 2 X 0.044= 0.088. This last is the proba¬ 
bility that answers our question. Any similar question can be 
readily answered by substituting in formula (41), above. 

3. The Binomial Distribution.—We often want to know the 
probability of getting as many as or more than a specified number 
of successes or failures. From what has been said, it will be 
seen that the probability of getting no successes at all in a toss 
of n pennes is q^, of getting one success is nCipq^~^, of getting 
two successes is nCip\^~-, and so on, and, finally, the probability 
of getting all successes is p^. Since these combinations of events 
exhaust the possibilities, some one of them is certain to occur 
at any toss of n pennies. In other words, the probability of one 
or another of them occurring is unity, or one. By Theorem 2, 
we may therefore write the equation 


q^ + nCipq^^'^ + nC2P^q^~^ + nCzp^q^-^^ + • • • + nCrP^q^-'^ 

+ . . . + pn = (42) 

But by formula (40), ?^ + ^ = 1, and hence {p + qY = 1* It 
therefore appears that by substitution 


{q + pY == q^ nCipq^-^ + nC 2 pH^~~^ + * • • + nCrP^q^~^ 

+ • • • + (43) 

If p = g = ^, the formula simplifies to 
a + hY = a)”(l + nCl + + • • • + nCn-1 + 1). (44) 

This is the familiar binomial expansion of algebra, which is now 
seen to be an expression of the operation of the laws of chance!^ 


^ In algebra, the binomial formula is usually written: 


(g + p)" = «* + j p + -^” — 2 — g" 


s 4 1 w(n — l)(n — 2) . - 

^ — 12 - 3 ^ ^ 


+ 


+ pn. 


and it is pokited out that the exponent of q decreases by 1, while the expo¬ 
nent of p increases by 1, each term; and that the coefficient of any term, if 
multiplied by the exponent of q and divided by the number of the term, gives 
the coefficient of the next term. 



152 


ELEMENTARY SOCIAL STATISTICS 


To discover the probability of getting, say, eight or more heads 
in a single toss of 10 pennies, therefore, we need only apply the 
binomial. The probability of getting eight or more heads means, 
specifically, the probability of getting eight, nine, or 10 heads; 
and by Theorem 2, this is equal to the sum of the probabilities of 
the three separate events. By formula (41), which is the general 
term of the binomial, the probability of eight heads is loCgp®?*, 
of nine heads is ioCgp% and of 10 heads is Summing these, 

P = loCspY + ioC,p^q + = 45(.5)«(.5)2 + 10(.5)«(.5) 

+ (.5)10 (.5)10(45 + 10 + 1) = .0440 + .0098 + .0010 

= 0.055. 

Accordingly, in 1,000 throws of 10 pennies, we may expect to get 
eight, nine, or 10 heads about 55 times. And the probability of 
getting eight or more heads or eight or more tails is, of course, 
2 X .055 = .11, or 11 times in 100 throws. Notice that this is 
merely the most probable number and will vary from one set of 
100 throws of 10 pennies each to another. But in a very large 
number of throws the average proportion should come rather 
close to 11 per 100 throws of 10 pennies each. 

Suppose, again, we throw 10 pennies 150 times. In how 
many of these trials may we expect to get exactly eight of a kind? 
Since we have found this probability to be 0.088, we may expect 
this event in the proportion of about nine times in 100 trials, 
in the long run. If N represents the number of trials, and S 
the number of trials in which the specified event may be expected 
to happen, the formula is approximately 

S = PN. (45) 

Substituting P = .088 and N = 150 in this formula, we find 
S =: .088(150) = 13.2. That is, in 150 tosses of 10 pennies each, 
about 13 is the most probable number of tosses that will show 
exactly eight heads or eight tails. 

Similarly, if it is wanted to know the frequency with which each 
possible number of successes, from 0 to n, may be expected to 
occur by chance in N trials of n events each, each term of the 
binomial expansion in formula (42) or (43) is simply multiplied 
hyN: 

+ nCipq^-W + nCip^q^-^N + • • • + n,Crp'(t'^N 

+ • • • + p^N =» N. (46) 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 153 

Thus, if we throw 10 pennies 1,000 times, we have 

1,000(.5 + .5)1® = .00098(1,000) + .00977(1,000) + 
.04395(1,000) + .11719(1,000) + .20508(1,000) + .24609(1,000) 
+ .20508(1,000) + .11719(1,000) + .04395(1,000) 

+ .00977(1,000) + .00098(1,000), 


or 

.98 + 9.77 + 43.95 + 117.19 + 205.08 + 246.09 + 205.08 

+ 117.19 + 43.95 + 9.77 + .98 = 1,000. 

This is really a binomial frequency distribution^ and is so ar¬ 
ranged in Table 45. From this table, we see that out of 1,000 
tosses of 10 pennies each, we would expect no heads in only 
about one toss, one head in something like 10 tosses, two heads 
in approximately 44 tosses, and so on. 


Table 45.— Frequency Distribution 
Number of 
Heads (X) 

0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

Total. 


OP 1,000 Tosses op 10 Pennies 
Number of 
Tosses (/) 

.98 

9.77 

43.95 

117.19 

205.08 

246.09 

205.08 

117.19 

43.95 

9.77 

_ ^ 

.. 1 , 000.00 


Let us now pass from the theoretical case of penny tossing to 
some problem that might arise in social research. For example, 
the proportion of males in the urban population of Wisconsin 
in 1930 was pi = 0.4974; in the rural nonfarm population, 
P 2 = 0.5118; and in the rural farm population, ps = 0.5435. 
If we regard the three populations—urban, rural nonfarm, and 
rural farm—as ranked in the order of urbanness, and if we sub¬ 
tract the proportion of males in the less urban from that in the 
more urban of each of the three possible pairings of these popula¬ 
tions, we get 
^ See Chap. V. 




154 


ELEMENTARY SOCIAL STATISTICS 


Pi - P2 = .4974 ~ .5118 = -.0144. 

Pi - P3 = .4974 - .5436 = -.0461. 

P2 - P3 = .5118 - .5435 = -.0317. 

We notice that all three of the signs are negative. In the case of 
another Middle Western state taken at random, the same result 
was found. Does this mean that the proportion of males is really 
greater in the more rural populations, or may the negative signs 
in the two states be just a trick of chance? By formula (36), we 

3! 

see that there are ^ ^ possible orders of relative magni¬ 
tude that pi, p 2 , and pz can take pi < pa < P 2 ; P 2 < Pi < pa) 

etc.), if we assume that they are never equal (i.e., p^^ p^ 9 ^ ps). 
Since the order observed, pi < p 2 < Ps, is only one of the six, the 
probability that it will occur in one random trial (or state) is i. 
Hence the probability of getting only negative signs in both states 
is 

2C2(i)^(i)® = (i)^ = ijV = -028 

by formula (41). 

Statisticians usually insist on odds of at least 5 in 100, or 0.05, 
before they will risk the assumption that a result is not due to 
chance. By this standard, we eliminate chance in the present 
case, and are entitled to conclude that the proportion of males in 
the three populations is related to the degree of urbanness in those 
populations. 

In many situations similar to this, the binomial theorem 
enables us to determine the probability that repeated events may 
occur by chance alone, and to note whether or not the probability 
is so small that we may reject the hypothesis that chance is 
responsible. 

It is important to ask what is meant by chance in the preceding 
illustration. If we regard the census figures for the three popula¬ 
tions as representing three complete universes, there is no 
question of chance at all. Any differences noted in the propor¬ 
tions of males, however small they may be, are real differences 
between the universes, and that is the end of the matter. But 
if we think of the proportion of males in each of our three popula¬ 
tions as determined by a separate set of causes acting to pro¬ 
duce sample results, and if we want to know Y^hether or not these 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 155 


three sets of forces differ in any real way from one another, the 
problem of chance at once enters. By chance we mean a great 
number of small, unknown factors acting in many directions, as 
contrasted with large (biased)^ factors, usually known or know- 
able, acting constantly in the same direction. If the biased factors 
affecting the proportion of males differ from one of the three 
populations to another— e.g., more females than males migrate 
from rural to urban areas—the observed proportions of males will 
differ to a greater extent than can be accounted for by the action 
of small random forces. If the biased factors that produce the 
proportion of males in each of the three populations are essentially 
the same, however, any variation in the proportion of males 
from one population to another must be due to chance factors 
alone. It is usually good research method to seek to eliminate 
chance as a possible cause of differences before undertaking to 
discover what factors are responsible. 

If we already know from independent evidence, however, that 
important factors influencing the proportion of males varied 
between the three populations—the two sexes migrated 
unequally from the more-rural to the less-rural areas—there 
would be no point in testing the hypothesis that the differences 
were due to chance, except perhaps to confirm the a priori 
knowledge. When such a test fails to eliminate chance, it often 
means only that a larger sample is needed. It may sometimes be 
advisable to investigate carefully the biased factors in the situa¬ 
tions under comparison, even when chance has not been elimi¬ 
nated as a possible cause of the differences observed between them. 

A binomial distribution, such as that of Table 45, is like other 
distributions in having a mean, a standard deviation, and other 
statistical constants by which it may be described. The formulas 
for the mean and the standard deviation are 

Mb = 7ip, _ 

(Tb = \/npq, 

where the symbols have the same meanings as above. 

For the distribution of Table 45, Mb = 10(.5) = 5 heads, and 
cTfl = \/l0(.5)(.5) = 1.58 heads. 

^ See also third paragraph on p. 149, above. 

* For a derivation of these formulas see, for example, C. H. Richardson, 
An Introduction to Statistical Analysis, pp. 228-230, Harcourt, Brace and 
CJompany, Inc., New York, 1934. The subscript, B, means binomial. 


(47) 

(48) * 



156 


ELEMENTARY SOCIAL STATISTICS 


It is not necessary in chance situations that p and q should be 
equal. Thus, the probability of throwing an ace in a single toss 
of a die is p = i, and the probability of not throwing an ace is 
g = f. If 15 throws are to be made, the binomial is (^ + 
and this can be expanded and utilized just as was done above 
for p = g = ^. When p = g, the binomial is symmetrical in 
shape, when p ^ g*, it is asymmetrical or skewed. 

4. The Normal Distribution.—Graphs of the binomials 
32(^ + iy and 1,024(5 + 5 ) are shown in Fig. 38. Notice that 


Y-Throws 


Y^ThrOhfS 




0 12 34 5 0123456789 10 //eac/s 

=4^9’ “(M)“ 

Fia. 38,—Histograms of two binomials, as n increases. 



-2^] 0 +<r/ (^2a 

-L9flr HSScr HS6<T H9a 
Fig. 39.—The normal curve. 



they take the form of histograms rather than of smooth curves, 
because successes are counted only in whole numbers, yielding a 
discrete or discontinuous series. However, if the length of the 
scale is kept constant, as in the figure, the graph of the binomial 
1,024(1^ + is seen to be less broken in outline than is that of the 

binomial 32(| + i)®- As n increases, the graph approaches closer 
and closer to a smooth curve in appearance. If now n is indefi- 
* ^ means greater or less than. 


COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 157 


nitely increased, giving the binomial iVd + i)"®, it is evident that 
the intervals of the graph become smaller and smaller, until 
in effect the outline merges into that of a smooth curve. The 
resulting curve is the most important type of distribution in 
statistical theory, and is known variously as the normal curve, the 
Gaussian curve, the curve of error, or the curve of probabilities. 
Unlike the binomial distribution, it represents a continuous 
variable, which can take any value whatever, on the X scale. 
A graph of the normal curve is shown in Fig. 39. It may be 
thought of as enclosing a continuous surface, cut from a piece of 
thin sheet metal. Its equation is usually written 

N ~ 

y = (49) 

tTaj v27r ^ 

where x = X — M, or a mean deviate of X, 

N = total frequency of the distribution, 

TT =?= 3.141G, so that \/^ = 2.5066, 
e = 2.7183, the base of natural logarithms. 

If the area of the curve is taken as unity, equation (49) becomes 


y = 


1 

<r* \/27r 




(50) 


As an aid to understanding the curve represented by equation 
(50), let us analyze its equation. We shall begin by letting 

X 

— = t, so that equation (50) becomes 
O’* 


y = 


1 

<7* \/2'K 


-t* 


e ^ 


(51) 


In the calculation of tables of normal ordinates, it is also con¬ 
venient to let (T* = 1, giving 


y = 



(52) 


But, as seen above, tt is a mathematical constant with the value 
3.1416, so that \/^ = 2.5066, and ^ = .3989. Equation 

(52) may therefore be written 

y = .3989e ^ 


(53) 



158 


ELEMENTARY SOCIAL STATISTICS 


In Fig. 39, at the mean of the X’a on the X axis, « = 0. The 
height of the ordinate at any point is the value of y found from 
equation (53) by substituting the appropriate value of t. At 
» = 0 , < = x/<r» = 0/<r, = 0, and 

-( 0 )» 

y = .3989e 2 
y = .39896®. 

But any number raised to the zero power is 1*, so that 
y = .3989(1) = .3989. 


In other words, at the mean of the X’s, the height of the ordinate, 
y, is .3989, for any normal curve of unit area and unit standard 
deviation. This is plotted in Fig. 39. 

Next, for the same case, let t = x/cx = +2. Then, by 
formula (53), 

-( 2 )» 

y = .39896 2 ^ 
y = .3989e~*, 
y = .39896-2. 

But 



so that 


y = 


.3989 

6 * 


It has also been seen that e, like tt, is a mathematical constant, 
having the value 2.7183, so that 6 * = 7.38906. Hence, at 



.3989 
^ 7.38906 


.05399. 


This value is also plotted in Fig. 39. Notice that at — = — 2 , the 

Cx 

value of 2 / is the same as at — = +2, for in formula (53) evi- 

<Tx 

dently is the same as The normal curve is thus 

symmetrical, i.e., of the same shape, on each side of the mean. 
From this it follows that the mean and the median coincide. 

* See any text in elementary algebra. 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 159 


The student is asked to check the values of y found above at 
z/<T^ = 0 and at x/ctx = ±2 against those printed in Appendix 
Table 1. All the values in that table are calculated in this way, 
and may be used to complete the construction of Fig. 39. Thus, 
the height of the ordinates at ± l<r, read from the table, is .2420, 
and is so scaled in the figure. After several ordinates have been 
drawn, they are connected by a smooth line, to form the curve 
shown. 

The tallest ordinate of the normal curve occurs at the mean, 
hence the mean, median, and mode all coincide. This appears 
from the fact that when x = 0, y = .3989; whereas, when 

^ ^ 0, y = .3989/6 2. The latter term is always smaller than 
the former, since all positive powers of e are greater than 
l(e0 = 1 ). 

Another characteristic of the normal curve is that it is asymp¬ 
totic to the X axis, meaning that the curve constantly approaches 
but never touches the X axis as it extends indefinitely in both 
directions from the mean. 

Table 46 shows a hypothetical normal distribution with 
perfectly symmetrical frequencies. The actual frequencies of 
normal tables may depart in various degrees from this sym¬ 
metrical pattern, because of sampling errors or the use of class 
intervals that do not place the mean of the series exactly at the 
center of the distribution. 

Table 46.—Normal Distribution of Scores on an Army Attitudes Test 
(Hypothetical Data) 


Scores 

Men 

(X) 

(/) 

0- 4,9 

5 

5- 9.9 

17 

10-14.9 

44 

15-19.9 

92 

20-24.9 

150 

25-29.9 

191 

30-34.9 

191 

35-39.9 

150 

40-44.9 

92 

45-49.9 

44 

60-54.9 

17 

55-59.9 

_5 

Total. 

. 998 




160 


ELEMENTARY SOCIAL STATISTICS 


With the help of the integral calculus, it is possible to find the 
proportion of the area under any part of the normal curve, i,e., 
between the ordinates erected at any two points on the x scale. 
This has been done for the areas between the ordinate at the 
mean and ordinates erected at intervals of .Ola- along the x-axis. 
The results are shown in Appendix Table 1, in the column 
headed ‘^Area/’ Thus the area under the curve between the 
ordinate at the mean and the ordinate at la- is seen to be 0.34, or 
34 per cent (roughly one-third) of the total area under the curve. 
In Chap. VI we saw that the area under a frequency histogram, 
where the width of the interval is taken as one unit, is equal to 
the total frequency of the distribution. The same principle holds 
for the normal curve. 

Since the normal curve represents the distribution of frequen¬ 
cies in any normal universe, the proportion of the area between 
the ordinate at the mean and the ordinates at, say, x = ±lcr 
represents the most probable proportion of the frequencies of 
any random sample drawn from such a universe that may be 

X X 

expected to fall between the values ” = 0 and ~ = ± 1. Dif¬ 
ferently expressed, the proportion of the area between the 
ordinate at the mean and the ordinates at a: = ± la- is the pro6a- 
bility that a random sample value of X will fall between M* and 
+ la-. We see from Appendix Table 1 that this probability is 
twice 0.34, which is approximately 0.68, or 68 per cent. It 
should now be clear why in a normal distribution the odds are 
about two to one that a random value of X will be within a 
range of one standard deviation on each side of the mean value of 
X. Also, inasmuch as a value of X falls outside the range of 
Mx ± 2<r by chance only 1.00 — (2 X 0.477) = 0.046, or about one 
time in 20, we shall be fairly safe if we attribute those values that 
do so to something else than chance. In other words, we shall 
arbitrarily regard all such extreme values as significant. 

Reading again from Appendix Table 1, it is seen that approxi¬ 
mately 25 per cent of the area of the normal curve lies between 
the mean and an ordinate at x = 0.67cr. That is, one-half of the 
area of the curve is included between an ordinate at —0.67a- and 
an ordinate at -h0.67a-. From a finer table, the figure is found 
more exactly to be 0.6745a-, The distance 0.6745a- from the 
mean along the :t;-axis of the normal curve is commonly called the 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 161 


probable error (P.E,), and is often used instead of the standard 
deviation, <t, or standard error, as it is called in sampling theory 
(see Chap. XII). 

The relationships of the preceding paragraphs do not hold, 
however, for skewed distributions. This may be seen from Fig. 
41. By comparing the rectangles in the areovS M — \<t and 
M + la, it is clear that in this case a much larger proportion 
of the area of the curve is contained between M — la than 
between M + la, so that the standard deviation has no constant 
relation to the area or frequency. For this reason, the standard 
deviation has a variable meaning when applied to asymmetrical 
distributions, and should be cautiously interpreted in such cases. 



ard deviation and area under normal standard deviation and area under 
curve. skewed curve. 

In a normal distribution, A.D, = .80(r, so that the distance 
M ± A.D, on the scale includes about 58 per cent <5f the fre¬ 
quencies (see Appendix Table 1). 

When n is large, the labor of expanding the binomial becomes 
excessive. Under these conditions, if the value of np or nq is not 
too small, say 5 or more, the binomial so closely approximates the 
normal curve that the latter may be used in its stead for purposes 
of estimation, and the desired probabilities simply read from 
Appendix Table 1. 

Consider again the probability of getting eight or more heads 
or tails in a toss of only 10 pennies. In Fig. 39 we erect a per¬ 
pendicular at the point 

z _ X - :S: ^ X -np ^ 8 - 10(.5) ^ ^ Q 
<^B \/npq \/l0(.5)(.5) ’ 

and find the area between this ordinate and the ordinate at the 
mean. Entering Appendix Table 1 with z/a = 1.90, we see 
that the area desired is 0.4713 of the area of the whole curve. 
Subtracting 0.4713 from 0.5000, the area of half the curve, we 
get 0.0287 as the area to the right of the ordinate at z/a = +1.9. 


162 


ELEMENTARY SOCIAL STATISTICS 


Since the normal curve is assumed to represent the results of all 
possible tosses of 10 pennies, the area to the right of x/a = +1.9 
shows the proportion of tosses that in the long run may be 
expected to give eight or more heads. This proportion is the 
probability of getting eight or more heads in one toss of 10 pennies, 
so the probability of getting eight or more heads or eight or more 
tails is twice this, or P = 2(0.0287) = 0.0574. The true value 
of P as found above from the binomial expansion is P = 0.1094. 
The agreement is thus seen to be none too good when n is as small 
as 10. If n is increased to 15, however, np = 7.5, and we find 
more agreement. The probability of getting say 12 or more 
heads or tails is 0.0204 according to the normal curve, and 0.0176 
according to the binomial,^ the error being only 0.0028. For 
larger values of n, the two estimates may for most purposes be 
accepted as equivalent. 

The approximate probability of getting exactly eight heads or 
eight tails in a toss of n = 10 pennies is the height of the ordinate 
of the normal curve at the point X = 8, expressed in standard 
deviation units. This is because the number 8 is represented 
on the X scale by a point rather than by a distance, and on this 
point can be erected only a straight line, or ordinate, which 
theoretically has no width and hence no area. We now need 

~ at X = 8, t.e., at - = --^^====^212—= = 1.9. From Appendix 
’ <r Vl0(0.5)(0.5) 

Table! we find y = 0.0656, so that ^ = 0.0415, 

\/l0(0.5)(0.5) 

and 2 X 0.0415 = 0.083 is the probability desired. The correct 
probability already found by the binomial is 0.0879. 

If we choose to consider the normal curve merely as a device for 
approximating the probabilities of the binomial, rather than as a 
continuous mathematical distribution, it becomes possible to 
take certain liberties with it that will improve its accuracy for the 
purpose. For example, to determine the probability of throwing 
eight or more heads or tails in a toss of 10 pennies, we may allow 
the value X = 8 to occupy the area under the normal curve 
between the X values 7.5 and 8.5, and regard the area to the 
right of 7.6 as representing the probability of throwing eight or 
more heads. We may then erect a perpendicular in Fig. 39 

‘ + wCi»p«g* + uCi4pi^ + « (i)^*(455 + 105 + 16 + 1) 

- 0.01758. 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 163 


at the point 

X ^ X -np ^ 7.5 - 10(0.5) ^ 

VrW \/l0(0.5)(0.5) ' * 

and find the area between this ordinate and the ordinate at the 
mean to be 0.4429 (Appendix Tabic 1). The area to the right 
of the ordinate at 1.58(r is, therefore, 0.5000 — 0.4429 = 0.0571, 
which is the probability of throwing eight or more heads. The 
probability of throwing eight or more heads or eight or more 
tails is 2 X 0.0571 = 0.1142. This result is much closer to the 
correct binomial probability of 0.1094 than was that obtained 
above in the orthodox way. Indeed, the accuracy of the normal 
curve in approximating the binomial has now been made quite 
satisfactory even for n = 10. 

It is also possible to use a similar manipulation in estimating 
the probability of throwing exactly eight heads or eight tails in a 
toss of 10 pennies. We find from Appendix Table 1 the area 
under the curve included between an ordinate at X = 7.5 and 
an ordinate at X = 8.5. The table gives 0.4864 as the area 
between the mean ordinate and the ordinate at X = 8.5 (t.c., at 

X = a == 2.21(r), and 0.4429 as the area between 

Vl0(0.5)(0.5) 

the mean ordinate and the ordinate at X = 7.5 (i.c., at 


7.5 - 10(0.5) 
\/r0(0.5)(0:5) 


1 . 58 ( 7 ). 


Consequently, the area between the ordinate at X = 7.5 and 
the ordinate at X = 8.5 is 


0.4864 - 0.4429 = 0.0435. 

This is the probability of throwing exactly eight heads in a toss 
of 10 pennies; so the probability of throwing exactly eight heads 
or exactly eight tails is 2 X 0.0435 = 0.0870. The error from 
the binomial (0.0879) in this case is negligible. 

It should be noted that special modifications like those above 
in the use of the normal curve are usually worth while only when 
np is small, say np < 6. 

6. Skewness and Kurtosis. —The frequency distributions 
with which social scientists have to deal usually depart con¬ 
siderably from the normal form. Such a distribution is shown 



164 


ELEMENTARY SOCIAL STATISTICS 


in Table 47 and in Fig. 42. It is readily seen to extend farther 
in the positive direction from the mean than in the negative 


Y 



direction, and so is said to be 'positively skewed. If there is 
occasion to measure the amount of the skewness, an index is 

Table 47.—Relative Numbers of Divorced Couples by Years Married 


Years 

married (X) 

Divorced 
couples (/) 

Accumulated 

frequency 

0-0.9 

15 

15 

1.0-1.9 

72 

87 

2.0-2.9 

60 

147 

3.0-3.9 

43 

190 

4.0-4.9 

21 

211 

6.0-5.9 

17 

228 

6.0-6.9 

9 

237 

7.0-7.9 

8 

245 

8.0-8.9 

5 

260 

9.0-9.9 

2 

252 

Total. 

252 



provided by formula (54): 

Sfc == 3(M - Md) 

O’ 


( 54)1 


^ We saw in Chap. VII that the value of the mean, My is influenced by 
extreme values, and hence by skewness, but that the value of the mode, Mo, 
is not affected. In the present chapter it was learned that in a normal dis¬ 
tribution the mean, mode, and median all have the same value. These 
facts suggest as an approximate measure of absolute skewness, Sky the 
difference 

iSA; = M - Mo. (66) 

To change this to generalized units, we may write 

Sk - (56) 

Because the value of the mode can seldom be accurately determined, how- 




COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 166 


The values of Sk by this formula vary between ± 3, but values 
larger than ±1 do not often occur. If there is no skewness, 
Sh = 0. 

A more useful measure of skewness for some purposes is gi, 
which for large samples is approximately 

g, = (58) ‘ 

vz is the third moment about the mean of the distribution, defined 
by the equation vz — Xfx^/N, where x is a mean deviate as 
usual. 

For a normal distribution gi = 0. For other values of gi the 
sign indicates the direction of the skewness. Values of gi as 
great as ±2 mean decided skewness. 

A frequency distribution may also depart from the normal in 
height or peakedness.This is called kurtosis. If the observed 
distribution is flatter than the normal, it is said to be platykurtic; 
if more peaked, leptokurtic; if neither, mesokurtic. Kurtosis may 
be measured by ^ 2 . For large samples, an approximate formula is 

(7* = ^ - 3 (59) 

or 

va is the fourth moment, Xfx^/N, of the distribution; and is 
the second moment, V 2 = = S/x^/iV, squared. 

g 2 also is zero for a normal distribution. A positive value of 
g 2 indicates that the observed distribution is more peaked than 
the normal, and a negative value indicates that it is flatter. 


ever, it is considered preferable to replace it by its equivalent in terms of 
the median, Md. In any moderatey skewed distribution, the median falls 
about two-thirds of the distance from the mode to the mean (see Chap. YII, 
Fig. 31). We therefore have 

Mo = M - 3(M - Md), (57) 

Substituting this value of the mode in formula (56), 



* i» is the lower-case Greek letter nu. 




166 


ELEMENTARY SOCIAL STATISTICS 


Before formula (58) or (59) can conveniently be applied to a 
distribution like that of Table 47, some short-cut calculating 
formulas are needed: 

—( 60 ) 

2/d» - ^ S/dS/d* + (S/d)»] (61) 

2/d« - ^ Xfd»2fd + A s/d*(S/d)» - (2/d)«j, ( 62 ) 

where i = width of class interval. 

d = unit step deviation from an assumed mean. 

N = 2/. 

Notice that formulas (61) and (62) are merely extensions of the 
familiar short method of finding a standard deviation by the use 
of an assumed mean and unit step intervals. This appears 
clearly in Table 48, below. 

Let us now measure the skewness and kurtosis of the distribu¬ 
tion shown in Table 47, by comparing it with the normal curve. 
We set up the computing table: 


- JV 


Vi = 


N 


Table 48.— Computing Table fob Moments: Data op Table 47 


Years married 

D 

d 

fd 

/d* 

fd> 

fd* 

0-0.9 

S 

-2 

- 30 

60 


240 

1.0-1.9 


-1 

- 72 

72 


72 

2.0-2.9 

60 

0 

0 

0 


0 

3.0-3.9 

43 

+1 

+ 43 

43 

43 

43 

4.0-4.9 

21 

+2 

+ 42 

84 

168 

336 

5.0-5.9 

17 

+3 

+ 61 

153 

459 

1,377 

6.0-6.9 

9 

+4 

+ 36 

144 

576 


7.0-6.9 

8 

+5 

+ 40 

200 


5,000 

8.0-8.9 


+6 

+ 30 

180 



9.0-9.9 


+7 

+ 14 

98 

686 


Total. 

252 


154 

1,034 

3,820 

20,654 


Recalling the short formula for the mean, 


M = A 


+ 


iZfd 

IT’ 


where A is the assumed mean, we find for this table, 
M = 2.5 + l(iH) = 3.11. 

















COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 167 


Substituting in the formula for the median, 

where the symbols have the meanings explained in Chap. VII. 
We find 

Md = 2.0 + (1) = 2.65. 

For the standard deviation, we have 



_ 1 .. / l 034 /• 154 \ 2. 

V — 1 V Tnra- {stsj > 

<T = 1.93. 


Hence, according to formula (54), we find the skewness to be 


Sk 


3(3.11 - 2.65) 
1.93 


0.72. 


This shows considerable skewness in the positive direction. 

Let us next measure the amount of skewness in Table 48 by 
the use of formula (58). From formulas (60) and (61) we find 

<r* = (1.93)* = 3.73, 


- - 252 - A <164)0034) + (154)-] = 8.09. 

Substituting in formula (58), 


8.09 

(1.93)» 


1.13. 


This result agrees with tht.G obtained by formula (54), in 
showing positive skewness. 

We shall now measure the degree of kurtosis, if any, exhibited 
by the distribution of Table 48, through the use of formula (59). 
We need only one new value, vi, which may be found by formula 
(62). 

m (3.820)(154) + (1,034)(154)* 

13,528 
*'* 252 


53.7. 



168 ELEMENTARY SOCIAL STATISTICS 

Substituting in formula (59), 

^ 53.70 

gt = 3.86 - 3.00 = 0.86 

The value of g 2 is positive, so we conclude that the observed dis¬ 
tribution is leptokurtic, or more peaked than a normal curve. ^ 
Even though a sample distribution is found by the above 
methods to differ from the normal, the question arises whether or 
not the difference is one that might be due merely to random 
errors of sampling. This point is dealt with in Chap. XIII. 


Exercises 

1. Twelve children are to be used in the experimental study of domi¬ 
nating and submissive types of behavior. Each child is to be grouped 
(a) with one other child, (b) with two other children. What is the total 
possible number of such experimental groups of each size? 

2. Four villages, five cities, and five rural counties are to be grouped 
in all possible combinations of five. No distinction is made between 
areas of the same type, i.e., one village is the equivalent of another 
village. What is the total number of combinations? Describe them. 

3. The types of contact between families in a community are listed 
as: visit, church, lodge, school, business, and ‘‘other.^' But any or all 
of these contacts may appear together, as well as separately. How 
many combinations of all kinds are there between these several types 
of contact? 

4. The educational levels of a sample of husbands and wives are 
recorded as college, high school, grades, and illiterate. What is the 
total number of possible permutations of husband-wife relationships 
in terms of these levels, and what are they? 

6. How many marriages are possible between three pairs of brothers 
and sisters in our society? 

^ Another measure of kurtosis that is more commonly used than g 2 is ^ 2 : 

Pt ^ (63) 

For Table 48, above, /3s » 3.86. Since in a normal distribution /Ss » 3, 
the observed distribution is again seen to be leptokurtic. 



COMBINATION, PROBABILITY, NORMAL DISTRIBUTION 169 


6. In an experiment with four pairs of subjects, each pair consists of 
a male and a female, closely **matched*' in respect to certain sociological 
characteristics. They are to be given a test while seated around a table 
in such a way that the sexes alternate, and no members of a matched 
pair sit next to each other. In how many ways may this be done? 

Note: The number of different permutations of n things taken n at 
time when arranged in a circle is given by the formula {n — 1)1 

7. Gist and Clark give the following table: 


Distribution op Intelligence Scores of 2,544 (Kansas) Rural High- 
school Students in 1923, According to Present Rural and 
Urban Classification* 


I.Q. 

Urban 

Rural 

Total 

Under 95. 

378 

832 

1,210 

798 

95-104. 

326 

472 

105 and over. 

260 

276 

536 


Total. 

964 

1,580 

2,544 



♦ American Journal of Sociology, July, 1938, p. 43. 


Compare the observed frequencies with those expected by chance alone, 
apply the x* test, and comment on the results. 

8. Classification of many cases shows that the probability of a mar¬ 
riage ending in divorce under certain conditions is 0.20. In a sample of 
20 such marriages, what is the probability that there will be no divorce? 
What is the probability that there will be no more than two divorces? 
Compare the results from the binomial with those from the normal 
curve. 

9. In Exercise 8, if many random samples of 20 marriages each were 
taken from the type of marriage referred to, (a) What mean number 
of marriages per sample would be expected to end in divorce? (6) What 
would be the standard deviation of the numbers of marriages ending in 
divorce found from many samples? 

10. Calculate skewness and kurtosis for the distributions below: 


Failures on Parole in 50 Subsamples op Five Prisoners Each 
Failures Frequency 

0 1 

1 10 

2 17 

. 3 16 

4 7 

5 J3 

Total. 60 










170 


ELEMENTARY SOCIAL STATISTICS 


Persons 

1. 

2. 

3 . 

4 . 

5 . 

6 . 

7 . 

8 . 

9 . 

10 . 

11 . 

12 or more 

Total... 


Families by Size 


Frequencies 
... 24 

... 70 

... 62 
... 52 

... 36 

... 23 

... 14 

... 8 
... 6 
... 3 

1 

... 

... 299 


Families Classified by Age op Man Head 


Age, years 
Under 25... 

25^34. 

35-44. 

45-54. 

55-64. 

65-74. 

75 and over 
Total.... 


Frequency 
.. 13 

.. 59 

.. 71 

.. 57 

.. 37 
.. 19 

. _J5 

.. 262 


References 

Baten, W. D.: Mathematical Statistics, Chaps. IV, V, VI, and VII, John 
Wiley & Sons, Inc., New York, 1938. 

Camp, B. H. : Mathematical Part of Elementary Statistics, Part I, Chaps. II 
and V; Part II, Chaps. I, II, D. C. Heath & Company, Boston, 1931. 

Croxton, F. E., and D. J. Cowden: Applied General Statistics, Chap. X, 
pp. 333-337, Prentice-Hall, Inc., New York, 1939. 

Fine, H. B.: College Algebra, pp. 393-407, Ginn and Company, Boston, 1904. 

Fry, T. C.: Probability and Its Engineering Uses, Chaps. I, II, and III, 
D. Van Nostrand Company, Inc., New York, 1928. 

Holzinger, K. J.: Statistical Methods for Students in Education, Chaps. 
XI-XII, Ginn and Company, Boston, 1928. 

Richardson, C. H. : Introduction to Statistical Analysis, Chaps. V, IX, and 
X, Harcourt, Brace and Company, Inc., New York, 1934. 

Smith, James G.: Elementary Statistics, Part IV, Henry Holt and Company, 
Inc., New York, 1934. 

Tippett, L. H. C. : The Methods of Statistics, 2d ed., Chaps. I, II, and IV, 
Williams and Norgate, Ltd., London, 1937. 

Treloar, a. E. : Elements of Statistical Reasoning, Chaps. V, VI, XII, XV, 
and XVI, John Wiley & Sons, Inc., New York, 1939. 

Yule, G. U., and M. G. Kendall; An Introduction to the Theory of Statistics, 
Chaps. IX, X, and XXII, Charles Grifi5n & Company, Ltd., London, 
1937. 
























CHAPTER X 


GROSS RELATIONSHIP BETWEEN TWO FACTORS: 
SIMPLE LINEAR QUANTITATIVE CORRELATION 

One of the most common purposes of social research is to dis¬ 
cover whether or not there is any relationship between two 
factors, and to measure the amount of the relationship. For 
example, does the number of children in a family tend to decrease 
as the family income increases? If treated statistically, this 
kind of question is called a problem in correlation. As will be 
seen below, statistics is able to measure the amount of relation¬ 
ship (correlation) present in such cases, to provide an equation 
by which one of the factors can be predicted from a knowledge 
of the other, and to estimate the range of error in the predictions. 

1. The Scatter Diagram: Ungrouped Data. —As an introduc¬ 
tion to the method of simple linear correlation applied to un¬ 
grouped data, let us test the idea that the largest percentage 
increases of population in the United States between 1920 and 
1930 occurred in regions where the density of population per 
square mile was least in 1920. We shall limit ourselves here to 
examining the amount of correlation in the nine census divisions. 
The necessary figures are given in Table 49. 

Table 49.— Percentage op Population Increase, 1920-1930 (F), in 
Relation to Population per Square Mile in 1920 (X), by 
Geographic Divisions, United States* 


Division 

X 

F 

New England. 

119 

10 

Middle Atlantic. 

223 

18 

East North Central. 

88 

18 

West North Central. 

25 

6 

South Atlantic. 

52 

13 

East South Central. 

50 

11 

West South Central. 

24 

19 

c 

Mountain. 

4 

11 

Pacific. 

18 

47 



* From Abstraot of the Fifteenth Census of the United States, 1930, pp. 12-13. 

171 














172 


ELEMENTARY SOCIAL STATISTICS 


We may make a preliminary judgment by rough methods as to 
•whether or not any relationship is present between the X and Y 
series. Taking the four largest values of X, we find the average 
of the four corresponding Y values to be 14.75. For the four 
smallest values of X, the average Y value is 20.75. In other 
words, as the X values decrease, the Y values tend to increase, 
on the average. This suggests that there is some negative 
relationship between the two series. 

A better way of prejudging correlation is by means of a 
scatter diagram. The X and Y values are plotted on rectangular 
coordinate paper, as shown in Fig. 43. ^ It is now seen that if 


Y 



Fig. 43. —Scatter diagram for Table 49. 


the point for the Pacific region is omitted, the remaining points 
show no discernible tendency either to rise or to fall across the 
table. Any correlation present must, therefore, be due to a 
single case. It would be misleading to say that between 1920 
and 1930 there was a tendency for population in the United 
States to increase at a faster rate in thinly populated regions than 
in thickly populated regions, when as a matter of fact this was 
true in onjy one out of nine regions. There is accordingly no 
point in going any further with this problem, unless we wish to 
try areas smaller than census divisions. 

Consider a second problem. Do the counties of Wisconsin 
that have high birth rates also tend to have high death rates? 
Waiving the objections that a county is not always a homo- 

^ For example, the first pair of values constitute a point with the coordi- 
nates (119, 10). To plot this point in Fig. 43, after drawing the horizontal X 
axis and the Y axis perpendicular to it, we measure 119 units from the origin 
at 0 along the X axis, then up 10 Y units parallel to the Y axis, and there 
mark in the point. 



GROSS RELATIOl^SHIP BETWEEN TWO FACTORS 173 


geneous unit (e.gf., a county may be half urban and half rural)^ 
and that its population is often too small to yield reliable birth 
and death rates, let us compare the first 20 counties of the state, 
taken alphabetically, in 1935. The data are in Table 50. 

Table 50.— Birth and Death Rates by Counties in Wisconsin, 1935* 


County 


Irate (X) Irate (F) 


Adams. 18.6 

Ashland. 22.2 

Barron. 18.4 

Bayfield. 12.5 

Brown. 22.1 

Buffalo. 17.5 

Burnett. 17.2 

Calumet. 15.7 

Chippewa. 20.5 

Clark. 17.3 

Columbia. 17.4 

Crawford. 22.5 

Dane. 17.1 

Dodge. 14.4 

Door. 20.8 

Douglas. 16.2 

Dunn. 18.7 

Eau Claire. 22.0 

Florence. 17.8 

Fond du Lac. 17.3 


* From Report of the State Board of Healthy Wisconsin, 1934-1935, p. 210. 


XY 


F* 




180.42 

345.96 

94.09 

266.40 

492.84 

144.00 

191.36 

338.56 

108.16 

103.75 

156.25 

68.89 

256.36 

488.41 

134.56 

120.75 

306.25 

47.61 

177.16 

295.84 

106.09 

106.76 

246.49 

46.24 

248.05 

420.25 

146.41 

128.02 

299.29 

54.76 

241.86 

302.76 

193.21 

227.25 

506.25 

102.01 

235.98 

292.41 

190.44 

132.48 

207.36 

84.64 

203.84 

432.64 

96.04 

197.64 

262.44 

148.84 

173.91 

349.69 

86.49 

268.40 

484.00 

148.84 

186.90 

316.84 

110.25 

192.03 

299.29 

123.21 

3,839.32 

6,843.82 

2,234.78 


'1 



M. = ^ = 18.31 


My = = 10.38 


We shall apply the device of the scatter diagram to these 
figures. The results are shown in Fig. 44. 

From Fig. 44, we notice first that the range taken by the 
points is limited, none falling below 12 or above 23 on the X 
scale, and none below 6 or above 14 on the Y scale. It is'a 
general precaution that as a rule any correlation found for a 
given set of data should not be assumed to exist outside the 
range of the data. A man may accept a wage of 50 cents an 
hour to work eight hours or perhaps even 12 hours without 



























174 


ELEMEf^TARY SOCIAL STATISTICS 


resting, but it would be erroneous to suppose from this that he 
would continue to work an indefinite number of hours at that 
rate. After 12 or 14 hours, it would probably require more than 
60 cents to induce him to work another hour. Thus the relation¬ 
ship between wages (Z) and length of work period (7) would 
not be the same beyond the range of 12 hours as within that 
range. Similarly, counties with birth rates much below 12 or 
above 23 might show death rates entirely out of line with what 
would be expected from the relationship found between birth 
and death rates in the counties included in the study. 


Y 



A second fact shown by Fig. 44 is that there is a general 
tendency for the points to rise in the positive direction along the 
X scale. That is, as the birth rates in the counties increase, the 
death rates tend to increase also. This indicates that there is 
some positive correlation between the two kinds of rates that 
seems worthy of further investigation. We would not expect a 
high correlation, however, because the dots show considerable 
scatter, instead of following one another in a continuous line 
or curve. 

" It should be pointed out that if the data in Fig. 44 had fallen 
instead of rising in the positive direction along the X scale, a 
negative relationship would have been indicated. That is, 
there would have been a tendency for the death rates to decline 
as the birth rates increased. A negative correlation, of course, 






GROSS RELATIONSHIP BETWEEN TWO FACTORS 176 


shows just as much relationship as a positive correlation of the 
same degree. 

2. The Line of Regression: Ungrouped Data. —In simple cor¬ 
relation it is customary, whenever reasonable, to regard one of 
the factors, X, as an independent factor, and the other, F, as a 
dependent factor. Thus, above, the birth rate is taken as the 
independent factor, X, and the death rate as the dependent 
factor, 7, because the birth rate is believed to influence the 
death rate, rather than vice versa. 

Returning to Fig. 44, the next step in the attempt to measure 
the amount of correlation between the X and Y factors is to 
ask what is Wig form of the observed correlation. From inspec¬ 
tion of the figure, it appears that the simplest way to represent 
the relationship is by means of a straight line. This is fortunate, 
because the method of simple correlation that is described in 
this chapter deals only with straight-line, or linear, relationships. 
Relationships that take the form of curved lines are measured 
by other methods. When it seems advisable to use a formal 
mathematical test to determine y 
whether or not a relationship is linear, 
the description of such a test may be 
found in more advanced texts. ^ 

Although, of course, no one line 
will fit all the points in Fig, 44, math¬ 
ematics furnishes a formula for deter- 
mining the line of best fit, which is ing of tlie equation of a straight 
usually called the line of regression 

of Y on X. The general equation of a straight line is 

Yc = + hyxX, (65) 

where a is the intercept of the line on the Y axis, and h is the slope 
of the line with respect to the X axis, or the ratio of c to d in 
Fig. 45. (This follows from the argument that at any point, P, 

on the line, Y = a + c; but by definition or c = 6X; 

therefore, 7 = o + 6X.) 

, To determine the values of the constants, a and b, that will 
give the line of best fit, the following normal equations are used: 

^ G. U. Yule, arid M. G. Kendall, An Introduction to the Theory of 
Statistics, pp. 455-456, Charles Griffin & Company, Ltd., London, 1937. 




176 


ELEMENTARY SOCIAL STATISTICS 


(66)* 

(67) 


. _ 2XY - NMJHy _ NXXY - XXJiY_ Xxy 

“ SZ* - NM,^ JVSZ* - (SZ)* 2®*’ 

CLyX “ X} 

where the subscripts yx indicate the regression of Y on X. 

From Table 50, we substitute in formula (66): 

, _ 3839.32 - 20(18.31)(10.38) 

6843.82 - 20(18.31)2 ' 

hyx = .27516, t 

Oyx = 10.38 - .27516(18.31) = 5.34182. 

Substituting these values of a and h in formula (65), 

Yc = 5.3418 + .27516X. (68) 

Putting X = 12.5 in formula (68), we have 

Yc = 5.3418 + .27516(12.5). 

Yc = 8.78130. 

Letting X = 22, 

Yc = 11.39534. 

Plotting these two calculated points, (12.5, 8.78) and (22, 
11.395), in Fig. 44, we get the line of regression of F on X there 
shown. If X = 0, Fc = 5.34 = a. 

If the origin is shifted to the means of the two series,t (Fig. 46), 
equation (65) becomes 

yc = hx, (69) 

where x and y are deviates from their respective means. For the 


* Also, see formula (88). 

t These figures are carried to several decimal places to provide a check 
in the summation of the third column of Table 51. If the work has been 
correctly done, this column will sum approximately to zero. 

t Notice that the mean of the Ye values calculated from the regression 
equation is equal to the mean of the observed Y values. This may be shown 
algebraically by replacing a in equation (65), above, with its equivalent 
from equation (67): 

F« ** Oyx d" hyxXf 

Ye *=* My — hyxMx "h hyxXy 

My- by,M, + byyMy. 


XY. 


~7r 


My. 


If the second equation above is expressed in terms of mean deviates, we get 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 177 

present problem, this gives 

Vc = .27516X (70) 

which is a simpler equation and often easier to handle than 
equation (68). The yc values calculated from this equation, 


Y 



Fig. 46.—Shift of axes necessary to change regression line to mean deviate form 

(l/c = bx ), 

however, are of course not directly comparable with the observed 
y^s. For that reason equation (68) is used to provide the values 
in Table 51. 

A measure of the goodness of fit of the regression line 
Ye = 5.34 + .275X to the points in Fig. 44 is given by the 


(F« - My) = (My - My) - - M,) + by^(X - M.); 

(Yc - My) = by,(X - M*) 

or 

Pc = byrX. 

Subtracting My from each V value and Mx from each X value in equation 
(65) is equivalent to measuring all V values from the mean of the y*s, and 
all X values from the mean of the X^a, That is, the V axis in Fig. 46 is 
simply moved to the right to the mean of the X*a, and the X axis is moved 
up a distance equal to the mean of the F’s. This, of course, places the 
intersection of the two new axes at a point which has for its coordinates 
the means of the two series (M*, My). Since this point is the origin of the 
system of axes from which all values of X and Y are to be measured, however, 
it is convenient to give it the coordinates (0, 0). This is also necessary if we 
express x and y in mean deviate form as in equation (69), because at the 
point of the means the value of every mean deviate must be zero. 

It follows from the second equation above that the regression line always 
passes through the point (Mg, My), since, if we let X = Mg, 

Ye-- My- byxMg + byxMg, Yc « My. 

The same fact appears if we let x « 0 in equation (69): ye “ 6v*(0), y* - 0. 



178 ELEMENTARY SOCIAL STATISTICS 

formula for the standard error of estimate, Sy: 

(71) 


(72) 

where d is the difference between the observed and the calculated 
Y values, and N is the number of paired values. The d’s are 
shown in Table 51: 


Table 51.—Values op d and d’ 


1 

Observed 

(Y) 

Calculated 

(Yc) 

d 


9.7 

10.45980 

- .75980 

.57730 

12.0 

11.45037 

4- .54963 

.30209 

10.4 

10.40476 

- .00476 

.00002 

8.3 

8.78132 

- .48132 

.23167 

11.6 

11.42286 

-f .17714 

.03138 

6.9 

10.15712 

-3.25712 

10.60883 

10.3 

10.07457 

+ .22543 

.05081 

6.8 

9.66183 

-2.86183 

8.19007 

12.1 

10.98260 

41.11740 

1.24858 

7.4 

10.10209 

-2.70209 

7.30129 

13.9 

10.12960 

4-3.77040 

14.21592 

10.1 

11.53292 

-1.43292 

2.05326 

13.8 

10.04706 

-f-3.75294 

14.08456 

9.2 

9.30412 

- .10412 

.01084 

9.8 

11.06515 

-1.26515 

1.60060 

12.2 

9.79941 

+2.40059 

5.76283 

9.3 

10.48731 

-1.18731 

1.40971 

12.2 

11.39534 

+ .80466 

.64748 

10.5 

10.23967 

+ .26033 

.06777 

11.1 

10.10209 

+ .99791 

.99582 

207.0 

207.59099 

.00001 

69.39083 


Sy = 




or 


S. 


-4 


sy* - asy - h-LXY 
—' 


s. - - 1-88, 


or, using formula (72), 

s.^4 


'2234.78 - 5.34182(207.6) 
20 


.27516(3839.32) 


1 . 86 . 






GROSS RELATIONSHIP BETWEEN TWO FACTORS 179 


The standard error of estimate is like the standard deviation, 
except that in the case of the latter the Y values are subtracted 
from their mean, while in the case of the former they are sub¬ 
tracted from the regression line, f.e., from the calculated YeS, 
Notice in Table 51 that the deviations from regression add to 
zero, just as do mean deviations. If the distribution of Y values 
is normal, two out of three of the observed F^s will not vary 
from the regression line by more than one standard error of 
estimate on each side. This may be shown graphically by 
plotting in the range +>Sy from the regression line in Fig. 44. 
Adding and subtracting 1.86 and Yc = 8.78 at X = 12.5, and 
then 1.86 and Fc = 11.40 at X = 22, gives a range of 
6.92-10.64 at the small end of the scale and a range of 
9.54-13.26 at the large end. Accordingly, only six counties— 
Buffalo, Calumet, Clark, Columbia, Dane, and Douglas—out 
of the 20 arc found to fall outside the range + ISy. Thus 30 per 
cent of the cases exceed the range, compared ^\ith 32 per cent 
in a strictly normal distribution. This close agreement is in 
spite of the small number of counties in Table 50. 

There is, of course, seldom any reason for using a regression 
equation to calculate values of F for comparison with the data 
from which the regression equation was obtained. A regression 
equation is rather applied to new data for the purpose of making 
'predictions. For example, the usefulness of the regression 
equation (68), based on Table 50, lies in telling us what death rates 
to expect in counties that are not included in the table, or in a 
year other than 1935. 

Even in the prediction of individual F values when r is low, 
however, it is often possible to reach relatively safe conclusions 
by noting the odds in their favor. For example, the most 
probable value of F corresponding to an X value of 18.6 was 
found by substituting X = 18.6 in equation (68), giving 
Yc = 10.46. In other words, if we know that a county had a 
birth rate of 18.6, we can predict that its most probable death 
rate is 10.46, and we can feel some confidence that its actual 
death rate will not usually be below 8.60 or above 12.32 (f.e., 
10.46 ± 1.86). If we wish to be surer, the odds are about 
20 to 1 in"^ a normal distribution that the death rate of this 
county will fall between 10.46 + (1.86 X 2), t.c., between 6.74 
and 14.18. If practical certainty is required, only once in some 



180 


ELEMENTARY SOCIAL STATISTICS 


369 times in a normal distribution will the death rate exceed 
the range of 10.46 + (1.86 X 3), or 4.88 to 16.04, inclusive. 
The spread of possible error is now large, but the advantage over 
random guessing is still considerable. This is usually true even 
after making allowance for the fact that the distribution is not 
normal, and for errors due to sampling. 

The same principle applies to a variety of related questions, 
e,g.j What is the probability that a county with a birth rate of 
17 will have a death rate as low as 8 or as high as 12? Sub¬ 
stituting Z = 17 in regression equation (68), we find 7c = 10, 
approximately. The difference between the expected death 
rate of 10 and a death rate of 8 or 12 is ±2. If we regard the 
death rates of all counties whose birth rate is 17 as normally 
distributed about a mean of 10, with a standard deviation of 
Sy = 1.86, then the difference ±2 lies 2.00/1.86 = 1.08 standard 
deviation units above or below the mean. Referring to a table 
of normal areas (Appendix Table 1), we see that practically 36 
per cent of the area of the curve falls between the mean and an 
ordinate at l.OScr. Hence we may say that a deviation as great 
as or greater than 1.08<r may occur above or below the mean 
100.0 — 2(36) = 28 times in 100. The odds are therefore 72 to 
28, or roughly 2J to 1, against such an event. 

In equations (65) and (66), 6, which is the slope of the regres¬ 
sion line of 7 on A, is called the regression coefficient. It is a 
useful measure, since it shows the number of 7 units that the 
most probable value of 7 changes for each unit change in X. 
For example, in equation (68), 7c = 5.34 + 0.275X, the regres¬ 
sion coefficient is 0.275, which means that the most probable 
value of 7 increases 0.275 of a 7 unit for every X unit that 
X increases. If the equation were 7c = 5.34 — 0.275A', the 
most probable value of 7 would decrease 0.275 of a unit for 
each unit that X increased. 

3. The Coefficient of Correlation: Ungrouped Data. —Although 
the table of X and 7 paired values (Table 50), the scatter 
diagram (Fig. 44), the regression equation of 7 on X (formula 
(65)), the regression coefficient 6, and the standard error of 
estimate Sy give a great deal of information about the amount 
and nature of the relationship between two variables, X and 7, 
none of them furnishes in a single figure an index of the amount 
of the relationship. This is supplied by the simple Pearsonian 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 181 


coefficient of correlation, r, which for ungrouped data may be 
found from the following formula; 

r = SJyr - NM^My 

V(S^" - NMy^) 

Appl 3 dng this formula to Table 50, 

_ 3839.32 - 20(18.31)(10.38) _ 

^ V{6843.82 - 20(18.31)2][2234.78 - 20(10.38)*l’ 

38.16 

r = .36. 

Since r is a coeflSLcient that can vary only from 0 to ±1, this 
is not a high value, indicating rather low relationship between 
the birth rates and death rates in the 20 sample counties of 


‘ Alternative formulas, which are sometimes convenient, are 


r 


r* 


r 

r 

r 

r 

r 


NXXY ~ XXXY 


VIAsA* - - (aF)*! 

a^Y + b'SXY - N^^y 


sy« 


XXY - 


XX Y - NMrMv 


Naxtry 



'^/byzhxyj 
<rx^ -f <ry^ — (f p^^ 
2 y/<Xx^ 


(74) 

(75) 

(76) 

(77) 

(78) 

(79) 

(80) 


where D refers to the differences between the raw paired values. This is 
known as the difference formula. 


Xxy ^ _1 2 ^ B 

NffxCy N arjg ay N 



ay 


(81) 

(82) 


where <r« is the standard deviation of the Fc*s calculated from the regression 
equation. See also formula (89). 



182 


ELEMENTARY SOCIAL STATISTICS 


Wisconsin. It is about what would be expected from the scatter 
diagram (Fig. 44). 

The labor of computing a correlation coefficient from ungrouped 
data can sometimes be reduced by dividing one or both series by 
some appropriate divisor, or by subtracting an arbitrary constant 
from the values of either or both series. As will be seen, this 
does not affect the value of r. The method also applies to the 
regression equation, provided the original values are restored. 

4. Size of Sample from Which r Is Calculated.—It is assumed 
throughout the discussion of this chapter that the coefficient of 
correlation, r, is not calculated from very small numbers of 
paired values, say less than 25. If this assumption is not met, 
and the data are regarded as a sample, many of the formulas 
given need correction. Since small-sampling theory is omitted 
from this text, the student may see certain references listed at 
the end of this chapter for its treatment.^ 

6. The Meaning of the Correlation Coefficient, r.—It has already 
been seen that the standard error of estimate, Sy, around the 
regression line for Table 50 is approximately 1.86. The variance 
of the observed F^s is 



, _ 2234.78 f207.Qy 

20 \ 20 

<ry2 = 4. 


If we compare Sy^ with we shall have a measure known as 
the coefficient of alienation squared, k^: 


^ = 0 - 865 . 

( 2 )* 


(84)* 


This shows that 86.5 per cent of the variance in county death 
rates remains in the form of scatter around the regression 


^ See, for example, Yule and Kendall, Ezekiel, Fisher, and Croxton and 
Cowden. The student should not be misled by the circumstance that, in the 
example of Table 50, 20 pairs of values were treated as a large sample. This 
was done only for convenience of illustration. Strictly, small-sample 
methods should be used with 20 cases, although even for that size of sample 
it often makes no important difference. 

* Compare the distances of the dots from the regression line and from the 
mean of the Y*b in Fig. 44. 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 183 


line, which is not controlled by the birth rates, 
formula (78), 



or 

and 


A:* = 1 - rS 

= 1 . 


Again, by 


(85) 


That is, r® and together account for 100 per cent of the variance 
in y. Since we have just seen that indicates the percentage 
not controlled by X, = 1 — k^ evidently indicates the per¬ 
centage controlled by X through the medium of the regression 
equation. Thus, above, = (.36) ^ = .13, meaning that a cor¬ 
relation of r = .36 accounts for only 13 per cent of the variance 
of the Y series. This interpretation of is further clarified by 
formula (82) squared, 



Here the numerator, ctc^, is the variance of the Ye scries calcu¬ 
lated from the regression equation, so that its value is entirely 
controlled by X. 

Substituting the values of and k^ found in the illustrative 
problem above in formula (85), we get 


(.36)2 + 865 = .995, 

or 99.5 per cent, the slight variation from 100 per cent being due 
to approximations in the calculation of and k^. 

Notice, in general, that an r as large as .71 is required to cut 
the variance of Y by 50 per cent (if = .50, then 


r = = .71). 


Where both X and Y are assumed to be built up of simple elements 
of equal variability all of which are present in Y but some of which 
are lacking in X, it can be proved mathematically that r* measures that 
proportion of all the elements in Y which are also present in X. For 
that reason, in cases where the dependent variable is known to be 
causally related to the independent variable, r* may be called the 
coefficient of determination.^ 


^ Mordbcai Ezekiel, Methods of Correlation Analysis^ p. 120, John 
Wiley & Sons, Inc., New York, 1930. 



184 


ELEMENTARY SOCIAL STATISTICS 


Although these assumptions seldom hold in practice, it is 
customary to regard as a better measure of relationship than r. 
At any rate, r* is a more conservative estimate. 

Does the correlation between the birth rates and death rates in 
Table 50 mean that the birth rate is the cause of the death rate? 
Obviously, being born is not the cause of dying. Sanitary 
conditions, medical service, and various other factors determine 
death rates. It happens, however, that infants are more sus¬ 
ceptible to death by disease than are older children and adults, 
so for this reason, other things being equal, the population with 
the largest proportion of infants will have the highest death 
rate. In general, it may be said that the presence of simple 
correlation between two factors may or may not be accompanied 
by a direct or efficient causal connection between them. Often 
simple correlation is due to common causes, as when teachers^ 
salaries and the amount of money spent for alcoholic beverages 
rise and fall together with changes in business conditions. There 
is much danger that this kind of correlation will be misinter¬ 
preted. Sometimes, as in the case of the birth and death rates 
above, one factor is a necessary antecedent but not a direct 
cause of a correlated factor. Very rarely, two factors show a 
high but purely accidental correlation, as the yield of potatoes 
in Great Britain with, say, smallpox epidemics in the United 
States. The safest interpretation is that the presence of corre¬ 
lation between two factors indicates that as one increases the 
other tends to increase or decrease, f.e., they vary together to some 
extent. Why they vary together may be determined by further 
statistical and experimental methods, such as those of partial 
correlation and the laboratory, which seek to control the various 
interfering factors involved. 

Caution should be used in comparing two or more values of r. 
It often happens that interfering factors, of which the investigator 
takes no account, cause two r's that should be the same to differ 
widely, or two r’s that should differ widely to appear the same. 
Unless ‘‘other things are equal,” at least broadly, such compari¬ 
sons have little point. 

6. A Convenient Formula for the Regression Equation When 
r Is Known. —When the value of r is found before the regression 
equation is set up, the latter may conveniently be obtained from 
the equation 



OROSS RELATIONSHIP BETWEEN TWO FACTORS 185 


or 


Yc- My=^ (X - Af.), 


' (Ty 

ye = r~ X. 


( 86 ) 

(87) 


Comparing formula (87) with formula (69), it is seen that 


or 




( 88 ) 


7. Simple Linear Correlation Applied to Grouped Data. —The 

method of dealing with simple linear correlation developed above 
applies to ungrouped data, such as shown in Table 50. In the 
case of grouped data, the principles and procedures are the 
same, except that formulas (89) through (92) are specially 
adapted for use with frequency tables. 


where 


r _ 2xy 

(89) 

2xy = ’ZfiydJj, — • Xfud„, 

(90) 

Sx* = 

(91) 

2t/2 = XfA^ - 

(92) 


X and y are mean deviates, d* represents unit step deviations 
from an assumed mean of the dy represents unit step devia¬ 
tions from an assumed mean of the F^s, N is the total frequency 
of pairs in the table, /* is the total frequency of pairs in an X class 
or column, /y is the frequency of pairs in a F class or row, and 
fxy is the frequency of pairs in a cell. These symbols appear in 
the margins of correlation Table 53. 

It is reasonable that the proportion of children in a state's 
population should influence the percentage of the state's income 
that is spent for schooling. Let us measure the extent to which 
this is true. The data needed are in Table 52. For our purpose 
it is not necessary to weight the percentage figures by the state 
populations. 



186 


ELEMENTARY SOCIAL STATISTICS 


Table 52. —^Pebcentapb op Population under 19 Years op Age in 1930, 
AND Percentage That School Expenditures Were op All 
Income in 1928, by States* 


State 

Per cent of popula¬ 
tion under 19 years 
of age, 1930 
(X) 

Per cent school 
expenditures 
were of all in¬ 
come, 1928 (Y) 

Southeast: 

Virginia. 

44.4 

2.61 

N. Carolina. 

49.3 

4.38 

S. Carolina. 


3.16 

Georgia. 

46.3 

1.75 

Florida.. 

39.2 

5.76 



2.29 

Tennessee. 

43.8 

2.57 

Alabama. 


2.74 

Mississippi. 

46.6 

3.94 

Arkansas. 

45.3 

2.55 


44.0 

2.61 

Southwest: 

Oklahoma. 

44.2 

3.27 

Texas. 

42.6 

2.57 


46.8 

3.40 


42.1 

3.67 

Northeast: 

Maine. 

37.3 

1.93 

N. Hamoshire. 

35.2 

2.14 


37.0 

2.24 


35.1 

1.85 

R. Island. 

37.0 

1.89 

Connecticut. 

37.0 

2.46 

N. York. 

33.6 

2.11 

N. Jersey. 

36.1 

3.20 

Delaware. 

35.9 

1.91 

Pennavl vania. 

39.4 

2.20 


37.2 

1.97 


46.1 

3.21 

Middle States: 

Ohio. 

36.1 

3.05 

Indiana. 

36.5 

3.93 

Illinois. 

34.9 

2.28 


37.7 

3.92 

WiacofiBin. 

38.0 

2.95 


38.3 

3.65 

Iowa. 

37.2 

3.82 

Missouri. 

35.7 

2.46 

Northwest: 

N. Dakota. 

45.4 

6.13 

S. Dakota. 

42.5 

5.78 

Nebraska. 

39.3 

3.95 


38.1 

4.24 

Montana... 

39.0 

3.96 

Idaho. 

42.8 

4.02 


.39.2 

3.30 



3.29 

Utah. 

46.1 

3.91 

Far West: 

Nevada. 

31.8 

3.33 

Washington. 

33.7 

2.80 

Oregon. 

33.1 

3.31 


30.4 

3.25 

United States. 

38.8 

2.74 


*Froin T. J. Wooptbb, Jb., Landlord and Tenant on the Cotton Plantation, WPA 
Research Monograph V, 1936, p. 141. 

The 48 pairs of values in Table 52 are hardly enough to justify 
grouping, but are convenient for illustrating the grouped method. 
The entries in Table 53 are made from the ungrouped data of 
Table 52, as follows. X represents the percentage of the popu¬ 
lation under 19 years of age, and Y is the percentage that expend!- 






















































GROSS RELATIONSHIP BETWEEN TWO FACTORS 187 

tures for school purposes were of total income in 1928. The 
first state in Table 52 has X = 44.4, so it will fall somewhere 
in col. 44.0-45.9 of Table 53. Since the corresponding Y value 
is 2.61, a tally is entered in row 2.40-2.79 of col. 44.0-45.9. 
Similarly, the second state has an X value of 49.3 and a Y value 
of 4.38, so a tally is placed in col. 48.0-49.9 and row 4.00-4.39 
of Table 53; and so on. After all the entries are tallied in the 
cells, the tallies are counted and replaced by numbers. 

In Table 53 we then see two ordinary frequency distributions, 
X and y, placed at right angles to each other and exhibiting 
a double classification. The large figures in the cells are the 
frequencies. Instead of making a scatter diagram, as we did 
with ungrouped data, let us estimate the mean of the F's in 
each column of the tabic. Consider, for example, the column 
with the heading 34.0-35.9. We have for the mean 

(2.6 X 1 + 2.2 X 2 + 1.8 X 2) _ ^ , 

5 ^ 

This may be marked by a small circle at the left side of the 
column, although if it did not interfere with reading the table 
it should be located at the mid-point of the column. Similar 
circles indicate the positions of the means of the other columns 
which have a frequency as large as five. An inspection of these 
means shows that they have an irregular tendency to rise in the 
positive direction across the table. This suggests some positive 
correlation between X and Y. However, the circles form more 
of a curve than a straight line, rising to a peak in the 38.0-39.9 
column and then descending slightly. If we suppose that we 
are dealing with a sample thrown up by a particular set of 
causes, some of the irregularities may be due to random factors 
and a small sample. But even if we make allowance for the 
extreme cases in cols. 38.0-39.9, 42.0-43.9, and 44.0-45.9, the 
curved effect is not lessened. To assume that the relationship 
is linear and estimate the amount of correlation on that basis 
will reduce the value of the coefficient slightly, compared with 
the use of a coefficient of curvilinear correlation. Since we 
cannot deal with curvilinear correlation here, we shall use the 
simpler straight-line hypothesis. There is also some justification 
for this in view of the fact that the large scatter indicates a low 
correlation in any case. 



TaBLB 53. —COBBXLATION OF PERCENTAGE OF POPULATION UNDER 19 YeARS OF AgE IN 1930 (« X) WITH PERCENTAGE 
That School Ezpsnditubbs Were of All Income in 1928 (» F) in Each State of the United States* 


188 


ELEMENTARY SOCIAL STATISTICS 



^¥tom T. J. WoovTXB, Jb.» Landlord and Tenant on the Cotton Plantation, WPA Research Monograph F. 














































































GROSS RELATIONSHIP BETWEEN TWO FACTORS 189 


The line of regression of Y on X, and dotted lines representing 
+ ISyj the values for which are worked out below, are drawn in 
the correlation table (Table 53). A study of them in relation 
to the entries in the correlation table should be helpful, just as 
it was in the case of the scatter diagram for ungrouped data 
(see Fig. 44). It appears from Table 53 that the actual relation¬ 
ship changes from strongly positive in the left half of the table 
to moderately negative in the right half, whereas the linear 
regression implies a constant positive correlation throughout. 
Also, the linear equation is far from fitting the data of the two 
halves of the table equally well. On the other hand, in only one 
column does the proportion of items falling outside the range of 
one standard error of estimate around the regression line exceed 
the normal one-third. In practice it would probably not be 
worth while to carry the analysis any farther. We shall, how¬ 
ever, use the table to show the steps involved in calculating the 
Pearsonian correlation coefficient, r, the linear regression of Y 
on X,^ the standard error of estimate, Sy, and other statistics, 
from grouped data. 

Proceeding with Table 53, we enter unit-step deviations in row 
(2) and col. (2). The entries in row (3) and col. (3) and in row 
(4) and col. (4) are familiar and should be obvious from the 
symbols. Next, we multiply each cell frequency first by d* and 
place the product in the upper right-hand corner of the cell, 
and then by dy and place the product in the lower left-hand 
corner of the cell. The dx products are then added by rows and 
the dy products by columns. Column (6) and row (6) are 
obtained by multiplying the entries in col. (5) and row (5) by 
dy and d*, respectively, and the products are summed over the 
column and the row.^ 

We finally substitute from Table 53 in formulas (90)-(92), 

Xxy = 79 - (19)ff - 69.1, 

2a:* = 315 - = 302, 

2y* = 317 - = 309.5. 

^ There iff also always a regression line of X on Y, from which the most 
probable values of X may be calculated for given values of F. The two 
regression lines are not the same. To find the regression of X on F, simply 
change places with X and F in the equations given in this chapter. 

* As a check on the work, notice that in Table 53 cols. (3), (5), and (6) 
should have the same totals as rows (5), (3), and (6), respectively. 



190 ELEMENTARY SOCIAL STATISTICS 


Substituting these values in formula (89), 

69.1 ^ 69.1 ^ 69.1 

\/(302) (309.5) \/OT9 305.7' 

r = .23. 

This value of r indicates very little relationship. Nevertheless, 
for purposes of demonstration, we shall show the use of the 
formulas for finding the regression equation of F on X and the 
coefficient of alienation, k. We have 


, _ Xxy 


hJ = 


hj = 


Sx* 

69.1 

302 


= 0.23. 


(93) 


But this value of 6 is in terms of unit-step deviations or class 
intervals. To change it back to scale units. 


u _ 

byy - J—, 


where iy = 

4 = 


hum 


class interval of Y. 
class interval of X. 



0.046. 


CLyx .iWy hyx^l^x* 

Oyx = 3.16 - .046(40). 

CLyx ~ 1.32. 

Therefore, substituting in formula (65), we have 


(94) 


(95) 


also, 


Yc = 1.32 + .046X 


5,* 

ffy* 

v»* 

-Sv* 

jfc* 


(.4)MW - (il)*l, 
1.03, 

1.03(1 - .0529), 
.9755, 

8y* .9755 


(96) 

(97) 


Thus an r of .23 leaves 95 per cent of Cy* as scatter around the 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 191 


regression equation, or improves prediction only 
r2 = 1 - == 05, 

or about 5 per cent, in terms of the variance of the F's. 

The student is asked to check the plotting of the regression 
line and the lines showing the standard error of estimate in 
Table 53. 

Regression equations (69), (86), and (87) also apply to grouped 
data. 

From the above, it is clear that there is little tendency for the 
percentage of income expended for schools to be proportionate 
to the percentage of children under 19 years old in the population 
when states are taken as units and a linear relationship is assumed. 
Apart from the latter assumption, which has already been dis¬ 
cussed, it may well be objected that a state is a large area, 
within which very different relations between these two per¬ 
centages may exist. Thus a large city and a rural county in 
the same state may be more sharply unlike in this respect than 
two cities in separate states. For this reason, the average 
relationship given for each state as a whole is likely to be unrep¬ 
resentative, and so to lack meaning. It would be much better 
if the data were available by school districts, in which case a 
higher correlation might be found. 

8, The Rank Correlation.—A method of linear correlation that 
takes account of the rank orders of paired items but disregards 
their values is sometimes used for rough work, or when the 
values of the items are not known. The formula is 


622)2 

N{N^ - 1 )' 


{9sy 


where D is the difference between the ranks of a pair of items, 
and N is the number of pairs. 

As an illustration of the use of this formula, let us refer back 
to Table 50, and rank the counties with respect to their death 
rates and birth rates, as shown in Table 54. When there are 
ties, as between Douglas and Eau Claire counties in death rates, 
and Clar^ and Fond du Lac in birth rates, the tied items are 
given the mean of the ranks they would occupy if they were not 
equal, and the next item takes the rank just above the highest 

' P is the lower-case Greek letter rho. 



192 ELEMENTARY SOCIAL STATISTICS 


rank used in finding the tied mean. For example, the ranks 7 

7 -I- g 

and 8 are averaged to give —g— = 7.6 as the mean rank of 

Clark and Fond du Lac counties, and Columbia county has the 
rank 9. 


Tablb 54.—Twenty Wisconsin Counties Ranked with Respect to 
Birth Rates and Death Rates (Low to High) 


County 

Rank in 
birth rate 
iX) 

Rank in 
death rate 
(Y) 

Differ¬ 
ence (D) 

D* 

Bayfield. 

1 

4 

- 3 

9 

Dodge. 

2 

5 

- 3 

9 

Calumet. 

3 

1 

2 

4 

Douglas. 

4 

17.5 

-13.5 

182.25 

Dane. 

5 

19 

-14 

196 

Burnett. 

6 

10 

- 4 

16 

Clark. 

7.6 

3 

4.5 

20.25 

Fond du Lac. 

7.5 

13 

- 5.5 

30.25 

Columbia. 

9 

20 

-11 

121 

Buffalo. 

10 

2 

8 

64 

Florence. 

11 

12 

- 1 

1 

Barron. 

12 

11 

1 

1 

Adams. 

13 

7 

6 

36 

Dunn. 

14 

6 

8 

64 

Chippewa. 

15 

16 

- 1 

1 

Door. 

16 

8 

8 

64 

Eau Claire. 

17 

17.5 

- .5 

.25 

Brown. 

18 

14 

4 

16 

Ashland. 

19 

15 

4 

16 

Crawford. 

20 

9 

11 

121 

Total. 




972 







Substituting in formula (98), 


6(972) 
20(400 - 1) 


= .27. 


Like r, the value of p may vary from +1.0 to —1.0. 


Exercises 

1. o. What is the amount of relationship between the length of French 
and English words in the accompanying table? Plot the data, and dis¬ 
cuss the scatter diagram. Is the relationship reasonably linear? Use 
both the ungrouped and the grouped methods of calculating r as a 


























GROSS RELATIONSHIP BETWEEN TWO FACTORS 193 


check. Do the two methods necessarily give exactly the same value of 
r? Explain. Just what does r mean in this case? 

Number of Letters in a Sample of French Words (X), and in Their 
Nearest English Equivalents (F) 



6. Get the regression of F on X from both the ungrouped and the 
grouped data, as a check, plot the line, and explain what a and 6 mean 
in the equation. 

c. What is the most probable length of an English word correspond¬ 
ing to a French word of six letters? 


























194 


ELEMENTARY SOCIAL STATISTICS 


d. Within what range will the number of letters in the English words 
in (a) fall two times out of three? Ninety-five times out of 100? 
Practically always? 

e. What is the value of the coefficient of alienation squared, and 
what does it mean here? 

/. What is the coefficient of determination and its interpretation 
in this problem? 

g. Find the coefficient of rank correlation, p, for the same data, and 
compare its value, meaning, and adequacy with r. 

2. For the table below, find the value of r and of 6, and compare 
them in meaning. 


Aob of Fathers (F) Correlated with Age of Sons (X) 


X 

Y 

25 

27 

29 

60 

3 

5 

7 

65 

2 

11 

14 

70 


2 

6 


3. Find r* and for the following table, and explain their meaning. 

Number of Children in the First Generation of Sex Families (X), 
AND the Average Number of Children in the Second Generation 
OP the Same Families (F) 


X 

3 

4 

6 

7 

9 

15 

Y 

3 

2 

4 

4 

5 

5 


4 . a. By inspection, is there any relationship between the votes of 
the states in 1876 and in 1932? If any, is it positive or negative? 


Repxtblican Vote for President in Nine States, 1876 and 1932 


State 

1 Per cent of vote Republican 

1876 

1932 

Massachusetts. 

58 

48 

New York. 

48 

41 

Wisconsin. 

55 

32 

Missouri. 

41 

35 

Virginia. 

41 


Mississippi. 

21 

4 

Louisiana. 

48 

7 

Nevada. 

53 

31 

California. 

51 

39 









































GROSS RELATIONSHIP BETWEEN TWO FACTORS 195 


6. What does the scatter diagram show? 

c. What is the equation of the regression of Y on X, where X is the 
percentage of the vote Republican in 1876, and Y is the percentage of 
the vote Republican in 1932? Plot the line in the scatter diagram. 

d. What is the standard error of estimate? Plot it in the scatter 
diagram. 

c. What is the most probable percentage of the vote Republican in 
1932 of a state that voted 55 per cent Republican in 1876? 

/. Assuming a normal distribution about the regression line, within 
what limits of error will the percentage vote fall two out of three 
times? 20 out of 21 times? Within what limits of error does it 
actually fall in each case? 

6. a. What does the scatter diagram show in the case of the accom¬ 
panying table of death rates in Connecticut and Massachusetts? 


Death Rate in Connecticut and Massachusetts* 


Year. 

1924 

1923 

1922 

1921 

1920 

1919 

1918 

Connecticut. 

11.3 

12.0 

12.0 

11.4 

13.6 

13.3 

20.4 

Massachusetts. 

12.0 

13.0 

12.8 

12.2 

1 

13.8 

13.6 

20.9 


* From B. H. Camp, The Mathematical Part of Elementary StatiBiice , p. 144. D. C. Heath 
& Company, Boston, 1935. 


6. If the death rate for Massachusetts is 12 in 1924, what is the most 
probable death rate for Connecticut in the same year in terms of the 
relationship between the two? 

c. How much of the variance still remains as scatter in predicting a 
death rate in Connecticut from one in Massachusetts? 

References 

Chaddock, R. Ei.: Principles and Methods of Statistics^ Chap. XII, Houghton 
Mifflin Company, Boston, 1925. 

Croxton, F. E., and D. J. Cow'den: Applied General Statistics^ Chap. XXII, 
Prentice-Hall, Inc., New York, 1939. 

Davies, G. R., and Dale Yoder: Business Statistics, Chap. VI, John WUey 
& Sons, Inc., New York, 1937. 

Ezekiel, Mordecai: Methods of Correlation Analysis, Chaps. III-V, VII- 
IX, John Wiley & Sons, Inc., New York, 1930. 

Fisher, R. A.: Statistical Methods for Research Workers, 4th ed., Chap. VI, 
Oliver and Boyd, Edinburgh, 1932. 

Garrett, H. E.: Statistics in Psychology and Education, Chap. IV, Long¬ 
mans, Green & Company, New York, 1926. 

Lindquist, E. F.: A First Course in Statistics, Chap. XI, Houghton Mifflin 
Company, Boston. 1938. 









196 


ELEMENTARY SOCIAL STATISTICS 


Mills, F. C.: Statistical Methods^ rev. ed., Chap. X, Henry Holt and Com¬ 
pany, Inc., New York, 1938. 

Thurstone, L. L.: Fundamentals of Statistics, Chaps. XXII-XXIV, The 
Macmillan Company, New York, 1925. 

Treloar, Alan E. : Elements of Statistical Reasoning, Chaps. VII and VIH, 
John Wiley and Sons, New York, 1939. 

White, R. C.: Social Statistics, Chap. XI, Harper & Brothers, New York, 
1933. 

Yule, G. K., and M. G. Kendall: An Introduction to the Theory of Statistics, 
Chaps. XI~XIII, XV, XVI, Charles Griffin & Company, Ltd, London, 
1937. 



CHAPTER XI 


GROSS RELATIONSHIP BETWEEN TWO FACTORS: 

NONQUANTITATIVE CORRELATION 

1. Qualitative Data.—The method of correlation described up 
to this point has dealt with quantitative series only, e.g., birth 
and death rates, and proportion of state income spent for educa¬ 
tion. It often happens in sociological investigations, however, 
that it is needed to know the amount of relationship between 
two factors, one or both of which are qualitative. Examples of 
qualitative factors are rural or urban residence; personality 
ratings like Annoying, Unsympathetic, Sympathetic; occupa¬ 
tional classes—Professional, Proprietor, Clerical, Skilled, 
Unskilled; and so on. Methods for correlating data of this 
type have been devised. Before using them, effort should be 
made to convert the qualitative attributes into quantitative 
variables, because the latter are usually more accurate and 
reliable. Thus, a student might be classified by the number of 
credits earned in college, rather than as Sophomore or Junior. 

2. Reliability of Classification.—Since much depends on the 
reliability with which the nonquantitativc variables are classified, 
it is advisable to have the classification repeated by two or more 
qualified persons. If the results are very different, better 
criteria for classification should be developed, or the problem 
dropped. 

This point may be illustrated. The questionnaire that the 
members of a class in statistics filled out regarding their previous 
training in mathematics called for the sex of each student. If 
it were desired to correlate success in mathematics with sex, 
the members of the class might be divided by sex, and then 
subdivided into, say, four groups according to the average 
grades received in mathematics. This would give a table like 
Table 55. 

The question of the reliability of the classification by class 
standing in this table can be dismissed, because it is based on a 

197 



198 


ELEMENTARY SOCIAL STATISTICS 


quantitative variable, the average grades received in mathe¬ 
matics. The sex classification might be somewhat unreliable 
if it depended merely on the Christian names of the students 
in the questionnaires; but reference to the questionnaire used 
shows that the students checked the words Male and Female. 
The reliability of this classification can therefore also be accepted 
with confidence. We may then proceed to find the amount of 
relationship between the two factors in the table. 

Table 55.—Students in a Statistics Class Grouped by Sex and Grades 
Received in Mathematics 


Students by class standing 


OCX 

1 

2 

3 

4 

Total 

Male. 

4 

6 

6 

3 

19 

Female. 

7 

13 

15 

11 

46 

Total. 

11 

19 

21 

1 

1 

14 

65 


All classifications are not so simple as those in Table 55, how¬ 
ever. In Table 56, for example, a second competent person 
classified only 66 cases out of each 100 in the same way that this 
table shows, with respect to the economic status of the family. 
This was considered sufficient reason for abandoning the table. 


Table 56.—Economic Status of the Family in Which Parolee Was 
Reared and Outcome on Parole 


Status 

Parolees 

Parole violators 

Number 

Per cent 

Poor. 

287 

44 

15.3 

Moderate. 

261 

26 

10.0 

Comfortable. 

59 

6 

10.2 

TTnlcnown .*. 

22 

3 

_♦ 

Total. 

629 

79 



* Sample too small to warrant an estimate. 


3. Choice of a Method.—After the reliability of the classifica¬ 
tions in a nonquantitative correlation table has been established, 





















GROSS RELATIONSHIP BETWEEN TWO FACTORS 199 


the question of how to calculate the amount of relationship 
between the two factors in the table arises. The answer depends 
on the nature of the particular factors to be correlated. It is 
convenient to set up a key, as in Table 57, which will suggest 
what method should be used in each case. 

The terms in Table 57 need definition and illustration. Quan¬ 
titative means expressed in countable units, as crime rates or 
heights of male freshmen. Qualitative refers to nonmeasured 
traits, like those mentioned in the first paragraph of this chapter. 
Qualitative Ordered refers to qualitative categories that can be 
arranged in ascending or descending order, as Favorable, Indif¬ 
ferent, Hostile. Qualitative Unordcred applies to qualitative 
categories that cannot be arranged in ascending or descending 
order, e,g., Law, Medicine, Engineering. A dichotomous series 
is a series of two mutually exclusive and exhaustive categories, 
as Good, Not Good; Sick, Not Sick; Male, Female; College 
Graduates, Others; Families with Less than Four Children, 
Families with Four or More Children. 

Table 57. —Key to Selected Methods of Nonquantitativb 


Correlation 

Variable A 

Variable B 

Method 

Quantitative; several classes 

Dichotomous 

Biserial, rbw 

Quantitative or qualitative: 
ordered or unordered; 
several classes 

Qualitative: ordered or un¬ 
ordered ; several classes 
or dichotomous 

Contingency, C 

Dichotomous 

Dichotomous 

Tetrachoric, r* 
Yule^s Q 

Fourfold r 4 


It is not feasible to deal here with more than the five methods 
listed in Table 57, though a number of less prominent methods 
are omitted. 

4. Biserial Correlation.—In a study of divorce data for the 
United States in 1929, it is desirable to know whether there is 
any correlation between the party to whom the divorce was 
granted and the number of children affected. The data are 
shown in the first four columns of Table 58. We have here a 





200 


ELEMENTARY SOCIAL STATISTICS 


34 
■S32' 
130 

5 26 
^26 
I 24 
1^22 
£20 
^8 
“■ 16 


9 10 


quantitative series to be correlated with a dichotomous series. 
According to the key in Table 57, this requires the biserial 
method of correlation. 

The biserial method that we shall employ assumes that the 
dichotomous trait is normally distributed and continuous {i.e.y 
there is no gap in the series, and no disarrangement of an ordered 
series). The relationship must be linear, or that of a straight 
line. In the present case the idea of normality at first seems to 
have little meaning. However, if we think of the possibility of 
measuring the extent to which the husband or the wife is respon¬ 
sible for the granting of the di¬ 
vorce, and if it is reasonable to 
suppose that one party will sel¬ 
dom be w^holly the instigator, 
but that in most cases both will 
be about equally involved, we 
may perhaps assume that the 
distribution of the dichotomous 
factor is fairly normal. 

Since all reported divorces 
are included, the series is con¬ 
tinuous. As a rough test 
whether or not the relationship 
is linear, the scatter diagram shown in Fig. 47 is used. In this 
figure are plotted the percentages of the divorces granted to hus¬ 
bands by the number of children affected. The trend, if any, is 
very irregular. Where there are no children, a much larger per¬ 
centage of divorces is granted to the husband than where there are 
children. When the number of children is very large—i.c., 
eight or nine—^the proportion of divorces granted to the husband 
falls to a minimum. When the number of children affected 
ranges from one to seven, the proportion of divorces granted to 
the husband remains practically stationary. The low percent¬ 
ages of divorces granted to the husband when there are eight or 
nine children may be unreliable because of the small number of 
cases involved; but the circumstance that the percentage is low 
both for eight children and for nine children tends to support 
the observed figures. There seems to be little reason for calcu¬ 
lating the value of fw, in this case. We shall do so merely to 
show the method. 


I 2 3 4 5 6 7 8 
Number of children 
Fig. 47.—Relation of percentage of 
divorces granted to husband and num¬ 
ber of children involved. 




GROSS RELATIONSHIP BETWEEN TWO FACTORS 201 


Table 58.—Divorces Granted, Classified According to Number of 
Children Affected: 1929* 


Children 

affected 

(1) 

Divorces granted 

To husband 

(2) 

To wife 

(3) 

Total 

(/) 

(4) 

Per cent to 
husband (gx) 

(5) 

0 

36,840 

76,970 

113,810 

.3237 

1 

8,385 

32,223 

40,608 

.2065 

2 

4,255 

15,242 

19,497 

.2182 

3 

1,841 

6,161 

8,002 

' .2301 

4 

774 

2,571 

3,345 . 

.2314 

5 

352 

1,191 

1,543 

.2281 

6 

155 

518 

673 

.2303 

7 

68 

245 

313 

.2173 

8 

22 

108 

130 

.1692 

9 

16 

77 

93 

.1720 

Total. 

52,708 

135,306 

188,014 

.2803 

Mean. 

0.55 

0.77 

0.71 



♦ From Marriage and Divorce , 1929, p. 41, U. S. Bureau of the Census. “Nine or more 
children” taken as nine, and “no report as to children” disregarded. 


Apparently, the relationship in Table 58 is not linear. We 
shall work out the correlation, however, on the assumption that 
it is linear. The difference is unimportant here. 

The formula for finding bi.serial r is 

^ _ M2 - mi {pq) 

-- - ( 99 ) 

where mi is the mean of the smaller frequency distribution [cols. 
(1) and (2) of Table 58], is the mean of the larger frequency 
distribution [cols. (1) and (3)], cr is the standard deviation of the 
total frequency distribution [cols. (1) and (4)], p is the propor¬ 
tion that the total frequency of the larger distribution [col. (3)] 
is of the grand total frequency [col. (4)], g = 1 — p, and y is the 
height of the ordinate of a normal curve of unit area and unit 
standard deviation at the point separating the area of the curve 
into the proportions p and g, as found from Appendix Table 1. 
The means and standard deviation required are calculated by 
the usual method of unit-step deviations from an assumed mean. 
The required values are 





202 


ELEMENTARY SOCIAL STATISTICS 


mi — 0.55, 

Mt = 0.77, 
ff = 1.14, 

_ 135,306 _ 

^ 188,014 

q = 1.00 - .72 = 


.72, 


.28, 


To find y, we turn to Appendix Table 1. In Fig. 48 a normal 
curve is shown. As explained elsewhere, the values given in the 
body of the table represent the proportion of the area of the 
curve included between the mean ordinate (shown at zero in the 
figure) and ordinates erected at various distances, measured in 
standard deviation units, from the mean. Since here p = .72, 



we need to find the height of the ordinate which divides the 
curve so that .72 of its area falls to the left and .28 to the right. 
Evidently, .72 of the area will occupy the whole left half of the 
curve, and a proportion .72 — .50 = .22 will extend into the 
right half. Looking for .22 in the column of the table headed 
‘‘Area,'' we find as the nearest approximation to it the figure 
0.2190, and note that the corresponding figure in the column 
headed “ Ordinate {y) " is 0.3372. We therefore have y = 0.3372, 
and are ready to substitute in formula (99): 

_ /'0.77 - 0.55\ (.72) (.28) 

V 1.14 / .3372 

Tbia = .12. 


As would be expected from our preliminary analysis of Table 58 
and Fig. 47, the amount of linear relationship between divorces 
granted to husbands and the number of children affected is very 
slight. 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 203 


The sign of rbi. indicates the direction of the relationship 
between the quantitative factor and the proportion of cases in 
the distribution represented by p in formula (99). Here there 
is a slight positive association between number of children and 
divorces granted to the wife, or a slight negative association 
between number of children and divorces granted to the husband. 

The general conclusion from this analysis is that, if any corre¬ 
lation is present at all, there is a very slight tendency for the 
husband to receive the divorce relatively less often as the number 
of children increases. Much more informative, however, was 
the interpretation made from the scatter diagram in Fig. 47, 
that the proportion of divorces granted to the husband (1) was 
considerably greater where there were no children at all, (2) was 
little affected by increases in the number of children from one to 
seven, and (3) was a minimum when the number of children 
was eight or more. 

Biserial correlation is a special adaptation of the method of 
correlation used in finding the Pearsonian coefficient of correla¬ 
tion, r, for quantitative data. For this reason, rbb may be 
regarded as the nearest approximation to r that can be found 
when a quantitative series is correlated with a dichotomous 
series. 

6. The CoeflSicient of Contingency,—A total of 1,118 inmates 
of a state prison were classified as murderers, sex offenders, and 
property offenders. It was wanted to know how much, if any, 
correlation existed between these three criminal types and intel¬ 
ligence. An intelligence test was given to all the men, with the 
results shown in Table 59. This table contains one quantitative 
series and one unordered qualitative classification. The key 
in Table 57 indicates the method of contingency for finding the 
amount of association present. This coefficient is based on the 
Chi-square (x^) method, and measures the amount of deviation 
of the observed frequencies in the table from purely random or 
chance frequencies. The method of finding the chance or 
theoretical frequencies, /<, is based on two elementary theorems 
in the mathematics of probability which have already been 
treated (sep Chap. IX). Thus, the probability that any criminal 
will fall in, say, the first column of Table 59 is the ratio of the 
total number that fall in that column to the total frequency 



Table 59.—A Prison Population Classified by Type and by Intelligence Quotient 



♦ Adapted from aa impoUished study by J. L Gillia. /• » obeerred frequency; fi = expected frequency. 




































GROSS RELATIONSHIP BETWEEN TWO FACTORS 205 


of the table, or Ui/N = 70/1,118, where is the total frequency 
in col. (1), and N is the total frequency of the table. Likewise, 
the probability that any criminal will fall in, say, the first row 
is the ratio of the total number that fall in that row to the total 
frequency of the table, or ,-n = 17/1,118. Now the probability 
of two independent events occurring together is the product 
of the probabilities of their separate occurrences. Therefore, 
the probability that any criminal will fall in both the first column 
and the first row of the table is 


(ff)(s) -' W - (i;ra)(i;iH) - 

This means that about one out of every 1,000 prisoners in Table 
59 may be expected by chance alone to fall in the cell common 
to col. (1) and row (1). Since there are 1,118 prisoners in the 
table, the expected frequency is 


inriiiN) 

”7/2“ 


17(70) 

( 1 , 118)2 


(1,118) = 1.0644. 


This formula may evidently be shortened, however, to 


fi 

giving for the above ft = 


juni 

= 1.0644, again. 


( 100 ) 

We now 


write this expected frequency in row (1) and col. (2) of Table 59. 
By tise of formula (100), all of the expected frequencies are 
calculated and entered in cols. (2), (7), and (12). This compu¬ 
tation is more easily done for any column by setting Ui/N in 
the calculating machine and multiplying it successively by the 
total row frequencies, yn. It is a general principle of the test 
that no cell should contain much less than five expected fre¬ 
quencies. Any cell that offends in this respect should be com¬ 
bined with the cell above or below it. For this reason, in Table 
59 the frequencies of the first row and of the last two rows are 
combined with those just below or above. Comparing now the 
observed with the theoretical frequencies, we notice a consider¬ 
able amount of difference. This indicates some association 
between the criminal classifications and intelligence. We 
proceed to measure it by computing 



206 


ELEMENTARY SOCIAL STATISTICS 


. 2 (101) 
X* - 19.087 + 25.673 + 7.230 - 51.990 
Substituting in the formula for the coefficient of contingency, C, 


C = 
C = 

c = 


N + x^’ 


4 


51.99 


1118 + 51.99 


.21 


= \/.0444, 


( 102 ) 


The amount of association between the types of criminals and 
intelligence is seen to be low. If we regard our 1,118 prisoners 
as a random sample, what is the probability that the value of C 
is zero in the total population from which it was drawn? Before 
we can refer this question to a table of (Appendix Table 2), 
we must have regard for the proper degrees of freedom. It will 
be recalled^ that in each row and column of a contingency table 
{e.g,, Table 59), one of the cell frequencies is not “free,^’ because 
it may be determined by subtraction from the marginal totals. 
In any row or column, therefore, the number of free cell fre¬ 
quencies, or degrees of freedom, is one less than the number of 
cells (columns or rows). In Table 59 there are three columns 
and six rows, so that the degrees of freedom for the whole table 
are (3 — 1) (6 — 1) = (2) (5) = 10. With 10 degrees of freedom, 
we find in Appendix Table 2 that a x^ as great as 23 would occur 
by chance only once in 100 trials. Since our x^ = 52 is still 
larger, we can be sure that the differences are not random. 
That is equivalent to saying that the value of C indicates a low 
but genuine association between types of criminals and 
intelligence. 

C is usually found from a shorter formula than that used above: 


where 



( 103 ) 2 

(104) 


* See Chap. IX, p. 148. 

’For the derivation of this formula, see Karl J. Holzinger, Statistical 
Methods for Students in Education^ p. 275, Ginn and Company, Boston, 1928. 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 207 


The value in parentheses in (104) is calculated for each cell of the 
table, and these cell values are summed over the table. Thus 
for the cell 80-89 in col. (6), 


\/nniJ 248(123) 


.0074. 


Formula (103), however, does not provide a value of x* by 
which to test the significance of the association found. 

The coeflBicient of contingency has the defect that it under¬ 
states the amount of correlation actually present, in inverse pro¬ 
portion to the number of cells in the table. For a 3 X 3 table 
having perfect correlation, C would not bo 1.00, as it should, but 
.816; for a 5 X 5 table, the maximum value of C is .894; for a 
7X7 table, .926; for a 10 X 10 table, .949. Evidently C is not 
comparable between tables with different numbers of cells. 
For these reasons, it is well to apply C only to tables having 
say from 25 to 100 cells. 

It is possible to correct C to some extent for the above fault 
in cases where the correlation table has a fairly normal surface, as 
shown by the row and column totals in ordered series. For this 
purpose Table 60 may be used in connection with formula (105): 

(105) 

If for the moment we regard Table 59 as normal, we have from 
Table 60 for three columns tc = .859, and for six rows U = .959, 
so that 

C - - 25 

^ “ .959(.859) “ ^ 


Table 60.—Factors for Correcting C for Broad Grouping* 


Number 

Correction Factor 

(trt tc) 

Number 

Correction Factor 

(<r. tc) 

2 

.798 

9 

.981 

3 

.859 

10 

.985 

4 

.915 

11 

.987 

5 

.943 

12 

.989 

6 c 

.959 

13 

.991 

7 

.970 

14 

.992 

8 

.976 

15 

.993 


* From C. C. Fbtsrb and W. R. Van Voorhis, Statistical Procedures and Their McUhe^ 
fnatical Bases, p. 308, McGraw-Hill Book Company, Inc., New York, 1940. 







208 


ELEMENTARY SOCIAL STATISTICS 


The change in the value of C in this case is slight, and will 
always be so where the original value of C is low. The correction 
is therefore worth making only when the value of C is fairly 
high. Moreover, in the present case, one of the series in Table 
69 is unordered, so we are not justified in regarding it as approxi¬ 
mately normal in form, or in applying this correction to the C 
obtained from it. 

A coefficient of contingency, C, needs perhaps even more 
careful interpretation than other coefficients of correlation. In 
the first place, it has no sign, so that its meaning is dependent 
upon an examination of the correlation table itself. When both 
series are ordered, it is possible to assign a sign to C ; otherwise, 
not. In Table 59, the prisoner classification is unordered, so 
the C we found can have no sign. Notice also that the sizes 
of the for the three classes of criminals are not comparable, 
because the number of prisoners is different in each class. We 
may, however, compute the mean I.Q. for each of the three 
classes, and in that way note how they compare in intelligence. 
Thus we find that property offenders are most intelligent with 
an I.Q. of 79.96, while murderers and sex offenders are approxi¬ 
mately equal with I.Q.^s of 75.71 and 74.61, respectively. If 
the categories were Life Sentence, Medium Sentence, Short 
Sentence, instead of Murderers, Sex Offenders, Property Offend¬ 
ers, the sign of C might be regarded as negative, since intelligence 
increases as the length of prison sentence decreases. If neither 
factor in the table was quantitative, means could not be com¬ 
puted. In that case, we could only compare the columns with 
respect to the proportions of their frequencies falling in each 
category of the stub. 

6. Correlation in Fourfold Tables.—Any scale may be divided 
into just two parts, or dichotomies. For example, we may meas¬ 
ure head lengths, and then classify heads below a certain length 
as short, and those of this length and above as long. Many 
sociological variables that have never been measured are com¬ 
monly treated as dichotomies, c.^r., Cooperative, Not Coopera¬ 
tive. Some information is gained if a more detailed breakdown 
is feasible, such as Completely Cooperative, Very Cooperative, 
Average Cooperative, Uncooperative, Completely Uncooperative. 

Some qualities are most conveniently regarded as attributes 
rather than as quantitative variables, and naturally take a 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 209 


dichotomous form. Examples are Violator of Parole, Non¬ 
violator of Parole; White Race, Other Race. 

The measurement of the amount of relationship in a 2 X 2 
table is usually rather rough and inexact, regardless of what 
method is used. On this account, such a table is often merely 
tested for the presence of relationship, without attempting to 
measure it. The Chi-square test, explained in Chap. IX, is 
commonly relied on for this purpose. 

Suppose we are interested in whether or not there was any 
association between the occupation of agriculture and the 
tendency to commit crime in a given state over the period 1920- 
1930. Table 61 gives all the information at hand bearing on the 
question, together with the scheme of symbols used in a short 
formula for adapted to a 2 X 2 table. 


Tabi.e 61.— Occupational Distribution op the Adult Male Prison 
AND Nonprison Populations op a Given State, 1920-1930 


Occupational 

classification 

Mean prison 
population 

Mean nonprison 
population 

Total 

Agriculture. 

690 (u) 
2,310 (w) 

1,100,000 (u) 
900,000 (x) 

1,100,690 (,n) 
902,310 Un) 

Nonagriculture. 

Total. 

3,000 (n.) 

2,000,000 (ns) 

2,003,000 (iV) 



2 — (^ 3 : — vwYN 
Substituting in this formula, 


(106) 


, _ [(690)(900,000) - (1,100,000)(2,310)]22,003,000 
^ (3,000)(2,000,000)(1,100,690)(902,310) ’ 

X* = 1,239. 


Entering Appendix Table 2 with one degree of freedom, as we 
did in Chap. IX, we see that so large a value of x* would occur 
by chance much less often than once in 100 times. We may, 
therefore, regard the presence of association between the occupa¬ 
tion of agriculture and the commitment of crime in Table 61 
as established beyond doubt. 

If it seems worth while to go farther than the x^ test, and try 
to estimate approximately the degree of association in a 2 X 2 















210 


ELEMENTARY SOCIAL STATISTICS 


table^ there are several coefficients available. They are based 
on different principles, however, and give different results. 
We shall illustrate three such coefficients, namely, Yule^s Q, 
the ordinary coefficient of correlation adapted to fourfold tables, 
r 4 , and the coefficient of tetrachoric correlation, r/. Where one 
of them will not meet the needs of a particular problem, another 
usually will. 

The formula for Yule’s Q is Q = — ; (107) 

ux + vw 

where the symbols refer to cell frequencies as shown in Table 
61. Let us apply it to the data of Table 61. Substituting, we 
have 

^ _ (690)(900,000) - (1,100,000)(2,310) 

^ (690) (900,000) + (1,100,000)(2,310)' 

Q = -.61. 

According to this coefficient, there is a moderate amount of 
negative association between the occupation of agriculture and 
imprisonment for crime in Table 61, or, more generally, between 
the first column and the first row factors, when the positive 
and negative factors (e.g. Prison Population, Nonprison Popula¬ 
tion, Agriculture, Nonagriculture) are arranged as in the table. 
The result appears reasonable when it is noted that men usually 
engaged in agriculture formed only 690/3,000 = 0.23 of the 
prison population, but 1,100,000/2,000,000 = 0.56 of the non¬ 
prison population. 

Notice that Q = 0 ii vw = tuc, or if u/w = v/x; that Q = -fl 
if V and/or ti; is 0; and that Q = — 1 if w and/or x is 0. In other 
words, in Table 61, Q would show (1) zero association if the cell 
frequencies represented a purely random distribution of the 
table totals; (2) perfect positive association if all of the prison 
population, and/or none of the nonprison population was 
engaged in agriculture; (3) perfect negative association if none of 
the prison population, and/or all of the nonprison population 
was engaged in agriculture. The requirement for perfect asso¬ 
ciation is less stringent than if and/or” was replaced by *‘and” 
above, but Q is appropriate for treating the data of Table 61, if we 
are interested in the proportion of the prison population drawn 
from agriculture, as compared with the proportion of the non¬ 
prison population drawn from agriculture. 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 211 


Should we want to measure the extent to which farmers and 
prisoners are strictly identical or exclusive categories, we may 
use the formula 


ux — vw 

\/ in^n.niUt 


X 


(108) 


which assumes v — w 0 (Table 61) for perfect positive associa¬ 
tion, 16 = X = 0 for perfect negative association, and (like Q) 
vw — ux for no association. For Table 61, 

(690)(900,000) - (1,100,000)(2,310) 

V( 1,100,6W) (902,310) (3,000l (2,000,000)' 
r« = -.025. 

In view of the fact that the proportion of agriculturalists in 
the prison population was under half that in the nonprison 
population, the value of seems to be entirely too low, while 
the value of Q is about what would be expected. It seems extreme 
to insist that for perfect negative correlation the total nonagri- 
cultural population, but not a single farmer, must be in prison, 
as the formula for requires. For other problems, however, 
may be more appropriate than Yule^s Q. This suggests that 
the choice of a measure of correlation should be adapted to the 
particular problem and interest of the investigator. 

The two coefficients, Yule^s Q and are both designed for the 
special case where the frequencies are impressionistically divided 
into two groups, or, in geometric terms, roughly collected at two 
discrete points. In Table 61, these points are Agriculture and 
Nonagriculture for one factor, and Prison population and Non¬ 
prison population for the other. 

When the frequencies are distributed along two quantitative 
scales, and on each scale they are divided into two groups by a 
mark on the scale, and it is desired to find the amount of correla¬ 
tion between the paired scale values rather than between the 
proportions of cases in the two dichotomies, the so-called tetra-- 
choric method is appropriate if the underlying mathematical 
assumptions mentioned below can be met. In Table 63, the 
factor, size of household, is reduced to two classes from the 
quantitative distribution on Table 62; the other factors. Relief 
and Nonrelief, is qualitative, like the categories of Table 61. 

There are difficulties in the computation of the tetrachoric 
coefficient, ri, but an approximate formula is: 



212 


ELEMENTARY SOCIAL STATISTICS 


where, reading from a table of normal areas and ordinates 
(Appendix Table 1), 

h is the - value at .5 ~ ~ or .5 — ~ 

(T N N 

k is the - value at .5 — ^ or .5 — ^ 
a N N 

H is the height of the ordinate at A, 

K is the height of the ordinate at A, 
and the other symbols have the same meanings as in Table 61. 

The derivation of formula (109) assumes that both of the 
series (e.g.y size of households and relief-nonrelief) are normally 
distributed, that both dichotomies are continuous, that the 


Table 62.—Distribution of Rural Relief and Nonreliep Households 
BY Size, October, 1933* 


Size of household 

Households 

Relief 

Nonrelief 

Total 

10 persons and over. 

290 

246 

536 

9 persons. 

202 

213 

415 

8 persons. 

353 

336 

689 

7 persons. 

493 

560 

1,053 

6 persons. 

633 

997 

1,630 

5 persons. 

834 

1,322 

2,156 

2,907 

3,254 

3,175 

4' persons. 

846 

2,061 

3 persons. 

846 

2,408 

2,430 

627 

2 persons. 

745 

1 person. 

358 

985 

All households. 

5,600 

11,200 

16,800 



* Adapted from Thomas C. McCormiok, Comparative Study of Rural Relief and Non¬ 
relief Households, p. 88, Research Monograph II, Works Progress Administration, Division 
of Social Research, Washington, D. C., 1935. Mid-point of last interval taken as 11. 


^ An alternative formula is 

n ** — cos 

where 


[ X Vt^ 1 
(V ux -f y/vw)^ 

X - 180% 


( 110 ) 


the symbols are arranged as in Table 63, and the sign of r< is interpreted as in 
the case of Yule's Q above. 















GROSS RELATIONSHIP BETWEEN TWO FACTORS 213 


total frequency of the table is large, that the dichotomous divi¬ 
sions are not made too far toward the extremes of their dis¬ 
tributions, and that the relationship is linear. If the table is 
not normal, the value of r* is affected by the point of division 
of the dichotomies, i.e., by whether each series is divided in the 
middle of the scale or at some other point. 

In view of these restrictions, it hardly seems legitimate to 
apply Ti to Table 61 above. As Fig. 49 shows, the dichotomous 


Y 



Fig. 49.—Proportion of adult male population in prison, Table 61. 

line is drawn at the far upper end of the distribution of criminal¬ 
ity, where the value of is very sensitive to any skewness in the 
tail of the curve. 


Table 63.— Number of Rural Relief and Nonrelief Households 
Containing Less than Four Persons, and Four Persons and 
Over, October, 1933* 


Size of household 

Frequency 

Relief 

Nonrelief 

Total 

4 persons and more. 

3,651 (u) 

1,949 {w) 
5,600 (n,) 

5,735 (v) 
5,465 (x) 
11,200 (n,) 

9,386 (in) 
7,414 (,n) 
16,800 (N) 

3 persons and less. 

All households. 



* The data in the table should be so arranged that the value of the independent factor (sire 
of household) increases from the bottom row to the top row, and the value of the dependent 
t actor ^economic independence) increases from the first column (relief) to the second column 
Uionreliof). 

An inspection of Table 62 suggests that the distribution by 
size of household is somewhat skewed. We can test this, how¬ 
ever, by shifting the position of the dichotomous line, and noting 
the effect on the value of r^. If remains rather stable, it is 
evidence that the distribution is normal enough for the use of the 
tetrachoric method. There is no way to judge the normality 









214 


ELEMENTARY SOCIAL STATISTICS 


of the dichotomous series, relief-nonrelief, except by rationaliza¬ 
tion. If we believe that the families in the two groups together 
would form an approximately normal distribution when classified 
according to some index of dependency, say, per capita family 
income, we may be justified in proceeding. The size of household 
series is continuous (i.e., without gaps in the scale), and the 
same may be said of the relief-nonrelicf series, although the 
nonrelief households were all nearest neighbors of the relief 
households, and represent a restricted nonrelief group. The 
dichotomous divisions in Table 63 do not fall too near the tails 
of their respective distributions. The condition that the total 
frequency should be large is also met. ‘A rough test of linearity 
of relationship between the two series in the table could be made 
in the same way as was done in the case of biserial r (p. 200), but 
we shall not be risking a great deal if we dispense with it. 

Turning to the definitions of A, fc, etc., above, we find 


so that 


in ^ 9,386 
N 16,800 

W2 _ 11,200 

N “ 16,800 


0.5587, 

0.6667, 


0.5 - 0.5587 = -0.0587, 
0.5 - 0.6667 = -0.1667. 


From Appendix Table 1 we read, 
Corresponding to the ^^Area^^ entry 0.0587, 

h = 0.14+. 

Corresponding to the '^Area^^ entry 0.1667, 

k = 0.43+. 

And the corresponding ordinates, 

H = 0.395, 

K = 0.364. 


N = 16,800, 
V = 5,735, 
w = 1,949, 
u = 3,651, 
a; = 5,465. 


From Table 63 we have 



GROSS RELATIONSHIP BETWEEN TWO FACTORS 215 


Substituting in formula (109), 

(.14)(.43) ~ (lOiSOo) 

In a _ (2)(-14)(.43)[(l,949)(5,735) - (5,465)(3,651)] \ 

.^(16,800) (.395) (.364) f 

1 / 17,000'\ 

.0602 V 16,800/' 

r, = -.21. 

From Table 63, we see that 65 per cent of relief households 
have four or more persons, compared with only 51 per cent of 
nonrelief households. We therefore say that the degree of 
economic independence of a household is to a slight extent nega¬ 
tively correlated with the size of the household, as shown by the 
value Ti = —.21. 

A quick method of finding the value of n is provided by L. 
Chesire, M. Saffir, and L. L. Thurstone’s Computing Diagrams 
for the Tetrachoric Correlation Coefficient We shall use one of 
these diagrams (Fig. 50) to test the normality of the size-of- 
household series in Table 62 by recomputing r< after shifting the 
dichotomous line of division from three- to five-person house¬ 
holds, The new groupings are shown in Table 64. The fre¬ 
quencies are reduced to proportions of the table total, 16,800, by 
multiplying the reciprocal of 16,800, 


16,800 


= 0.00005952, 


into each cell frequency. The proportions are entered in Table 65. 
We now take any row or column total that is not greater than 
.500 as a, any other column or row total at right angles to it as b, 

Table 64. —Number of Rural Relief and Nonrelief Households 
Containing Less than Six Persons, and Six Persons and Over, 

October, 1933 


Size of household 


Frequency 


Relief Nonrelief Total 


6 persons and more. 
6 persons and less. . 
All households. 










216 


ELEMENTARY SOCIAL STATISTICS 


Table 65. —Frequencies of Table 64 Reduced to Proportions of the 
Table Total, for Use with Chesire, Saffir, and Thurstone’s 
Computing Diagrams 


Size of household 


6 persons and more. 
5 persons and less.. 
All households. 


Frequency 


Relief Nonrelief Total 



and the proportion in the cell common to the a row (or column) 
and the b column (or row), c. One set of these letters is 
indicated in the table. From Fig. 50, the diagram for a = .33, 



Fio. 50.—Sample computmg diagram for the tetrachoric correlation coefficient. 
(From L. dlhesire, M. Saffir, and L. L. Thurstone, Computing Diagrams for the 
Tetrachoric Corrdation Coeffidentt University of Chicago Bookstore, Chicago, 
1933.) 

we find at the intersection of the orthogonal^ lines representing 
b = .74 and c = .22 a value r* == j~.23. If in Table 66 c falls 
in a positive quadrant, has the sign shown in the diagram; but 
if c is in a negative quadrant, the sign indicated in the diagram 
is reversed. The signs of the quadrants are marked in Table 66, 
where it is seen that c is in a positive quadrant. Therefore, 

^ At light angles. 
















GROSS RELATIONSHIP BETWEEN TWO FACTORS 217 


ri = — .23, which agrees closely with the value of n computed for 
Table 63 with a different division of the dichotomy for size of 
households. So far as this test goes, then, the table seems to be 
normal enough to permit the use of the tetrachoric method. 
The test should be made for other points of division on the 
size-of-household scale, but would still be incomplete because 
new subdivisions cannot be tested in the relief-nonrelief series 
also.^ 

It should finally be observed that a fourfold correlation table 
includes some of the basic elements of experimental design. 
Thus, in Table 61, we have an independent factor or treatment. 
Agriculture; a dependent factor, Imprisonment for Crime; an 
experimental group, the Prison population; and a control group, 
the Nonprison population. On the other hand, dichotomies are 
used instead of classes based on measurement. In Table 61, 
sex and (roughly) age have been held constant, and there is 
nothing in the method that precludes as rigorous factor control 
as seems worth while. Even the broad 2X2 table may, there¬ 
fore, be a valuable analytical device. 


' If it is needed to determine the value of the tetrachoric coefficient, r*, 
very precisely, the complete formula may be seen in several texts, e.g,f 
Davenport and P^kas, Statistical Methods in Biology^ Medicine and Psychol- 
ogyf 4th ed., pp. 105-106, or Peters and Van Voorhis, Statistical Procedures 
and Their Mathematical BaseSj p. 370; and helpful tables with explanations 
are given by Karl Pearson, Tables for Statisticians and Bio metricians, 3d ed., 
Part I, pp. xxxvi, xliii, 1, liii, 31, 32, 33, 34, 42-52, 52-57; Part II, pp. xliv, 
73, 74. Formulas have been derived for the standard errors (see Chap. XII) 
of biserial r, the coefficient of tetrachoric correlation, and the coefficient of 
contingency. The standard error of the coefficient of contingency, C, is 
hardly needed if the value of x* for the contingency table is referred to a 
table of x*> as was done above in the section on this coefficient. The formula 
may be seen, however, in such texts as Holzinger, Statistical Methods for 
Students in Education, p. 278. The standard error of the tetrachoric corre¬ 
lation coefficient, r<, is also given by Davenport and Ekas, op. cit., p. 108, 
and by Peters and Van Voorhis, op. cit., p. 371. G. U. Yule and M. G. 
Kendall, An Introduction to the Theory of Statistics, p. 408, show the formula 
for the standard error of r 4 . Somewhat simpler are the standard error 
formulas of rbi« and Q: 


4 


Crbia 


y 


r»bi. 


1 - 


Vn 

e-JL + l + iri 

^ to * u V 


(HI) 


( 112 ) 



218 


ELEMENTARY SOCIAL STATISTICS 


Exercises 

1. What are the chief disadvantages in correlating qualitative series, 
as compared with quantitative series? 

2. What preliminary test should be made of a qualitative table 
before applying correlation to it? 

3. What is the amount of correlation between type of college training 
and success in teaching in the following table? 


Two Hundred High School Teachers Classified by Type of College 
FROM Which They Graduated, and by Success in Teaching 


Institution 

Successful 

Unsuccessful 

Total 

Teachers college. 

58 

42 


University or college. 

49 

51 

Biilifl 

Total. 

107 

93 

200 



Defend your choice of a coefficient, and explain the meaning of your 
results. 

4. How much association, if any, is there between the sex of distin¬ 
guished people and the socioeconomic class of their fathers in the 
table below? 

Famous British Men and Women Classified by Social Origin* 


Socioeconomic class of father 

Men 

Women 

Nobleman. 

1,059 


Gentleman. 

724 

83 

Politician, lawyer. 

666 

61 

Soldier, sailor. 

490 

53 

Divine. 

1,100 

274 

67 

Teacher. 

23 

Physician. 

396 

35 

Administrator... 

194 

12 

Writer, artist. 

371 


Businessman. 

929 

95 

Artisan. 

446 

38 

Laborer, servant. 

81 

8 

Agriculture. 

270 

18 


Total. 

7,000 

700 



* Adapted from Table 4, p. 708. Joseph Schneider, Class Origin and Fame: Eminent 
English Women, American Soeiolooieal Review^ Vol. 6, pp. 700-713,1940. 





























GROSS RELATIONSHIP BETWEEN TWO FACTORS 219 


Is the association positive or negative? What does the association 
mean in terms of this problem? What should be done about the correc¬ 
tion for broad grouping in this case? Is the value of C significantly 
greater than zero? Explain what this means. 

5. What is the amount of association between the sex of a sample of 
undergraduate students at the University of Wisconsin in 1938-1939 
and their state of residence? What coefficient is most appropriate to 
this problem, and why? Interpret its meaning. 


A Sample op Undergraduate Students, University op Wisconsin, 
1938-1939, Classified by Sex and by State op Residence 


State of residence 

Male 

Female 

Total 

Wisconsin. 

94 

44 

138 

Other. 

17 

27 

44 



Total.... 


111 

71 

182 



6. Find the amount of association between type of offense and body 
build in the table: 


Criminals Classified by Type of Offense and Body Build* 


Body 

build 

First- 

degree 

murder 

Second- 
degree 
murder 

As¬ 

sault 

Rob¬ 

bery 

Burg¬ 

lary 

and 

lar¬ 

ceny 

For¬ 

gery 

and 

fraud 

Rape 

1 

Other 

sex 

Vs. 

public 

wel¬ 

fare 

Arson 
and all 
other 

To¬ 

tal 

Slender. ... 

42 

79 

7 

54 

213 

57 

18 

18 

31 

7 

526 

Medium. . . 

155 

358 

49 

244 

1004 

260 

119 

Kl 

127 

71 

2467 

Heavy. 

77 

147 

18 

81 

302 

no 

44 

46 

77 

15 

917 

Total.... 

274 

584 

74 

379 

1519 

427 

181 

144 

235 

93 

3910 


♦ From A. E. IIooton, The American Criminal, Vol. I, Appendix, IX-8, Harvard Univer¬ 
sity Press, Cambridge, 1939. 


Is the value of the coefficient significantly greater than zero? What 
does the coefficient mean here? 

7. What m the amount of correlation between the age distributions 
of females in the neighboring urban and rural counties in the accom¬ 
panying table of age distributions? 


















220 


ELEMENTARY SOCIAL STATISTICS 


Age DrsTRiBunoNs op Females in a Rxtbal (Rutherford) and a Near-by 
Urban (Mecklenburg) County in North Carolina, 1930* 


Age, years 

Rural county 

Urban county 

Under 5. 

2,553 

6,542 

5^9. 

2,846 

7,311 

10-14. 

2,428 

6,424 

16-19. 

2,247 

6,751 

20-24. 

2,109 

7,862 

25^29. 

1,579 

6,990 

30-34. 

1,201 

5,277 

35-44. 

2,202 

8,288 

45-54. 

1,520 

6,199 

55-64. 

932 

2,548 

65-74. 

515 

1,342 

75 and over. 

237 

586 

Total. 

20,369 

65,120 


* From Fifteenth Census of the United States, 1930, Bureau of the Census. 


References 

Davenport, C. B., and M. P. Ekas: Statistical Methods in Biology, Medicine, 
and Psychology, 4th ed., pp. 97-108, John Wiley & Sons, Inc., New York, 
1936. 

Elderton, W. P.: Frequency .Curves and Correlation, 3d ed.. Chap. IX, 
University Press (John Wilson & Son, Inc.), Cambridge, Mass., 1938, 

Holztnqer, Karl J.: Statistical Methods for Students in Education, pp. 271- 
272, Ginn and Company, Boston, 1928. 

Peters, C. C., and W. R. Van Voorhis: Statistical Procedures and Their 
Mathematical Bases, Chaps. XIII and XIV, McGraw-Hill Book Com¬ 
pany, Inc., New York, 1940. 

Yule, G. U., and M. G. Kendall: An Introduction to the Theory of Statistics, 
Chaps. Ill and V, pp. 252-253, 408, 410, Charles Griffin & Company, 
Ltd,, London, 1937, 
















CHAPTER XII 


SAMPLING AND SAMPLING ERRORS 

1. Definitions.—In sociological research, it is seldom possible 
to study more than a part of the whole, or universe,^ in which 
we are interested. For example, if it is wanted to know whether 
the educated or the uneducated in the United States have the 
higher birth rate, it would be impractical to find the birth rate 
of the millions in each class. A sample would have to be taken 
of each group, and the birth rates of the two samples compared. 
If the samples were large and properly taken, the sample birth 
rates should be rather close to the true rates for the total edu¬ 
cated and uneducated in the country. 

A value (e.gf., a mean) found from a sample is called a statistic, 
whereas the corresponding true or expected value in the universe 
is called a parameter. The primary purpose of all sampling is 
to learn something about a universe, often to estimate the value 
of a parameter from the value of a statistic. There is seldom 
any interest in a sample or in the value of a statistic for its own 
sake. A good sample is, therefore, one that yields reliable 
information about a universe. 

The first step in sampling is to define the universe to be 
sampled. Thus we might define the universe of the educated 
as consisting of all married couples in the United States living 
together through the year 1939 who had successfully passed at 
least the first year of high school; and the universe of the unedu¬ 
cated as corresponding couples who had less schooling than 
this, the birth rates to be compared as of the year 1939. The 
sociological universe should usually be defined in both space 
and time. 

Since the universe is made up of events, a definition of the 
event is also necessary. In our illustration above, the event is a 
married couple with a birth or a married couple without a birth 
during 1939. There are thus two kinds of events, couples with 

^ Synonymous terms often used are population and parent. 

221 



222 


ELEMENTARY SOCIAL STATISTICS 


a birth, which may be called successes, and couples without a 
birth, which may be called failures. The word ‘^success 
merely designates that particular event among two or more 
different kinds of events in which the investigator is chiefly 
interested. If we were sampling farmers to find their net annual 
incomes in 1939, the event would be a farmer^s income, and would 
represent a continuous, measured variable. In the case of a 
measured variable, there is, of course, no dichotomy of success 
and failure, but merely a number of specific values. 

A universe may consist of an infinite or of a finite {limited) 
number of events. If the number of events is very large, the 
universe may be regarded as infinite for practical purposes. 

The events in a universe may already have happened, or may 
be yet to happen. In the former case they are said to be existent; 
in the latter, hypothetical. In our illustration above, at the 
beginning of the year 1939 none of the events (a birth or 
the absence of a birth to a married couple) has happened; at 
the end of the year, all of them have happened. Similarly, heads 
or tails is a hypothetical event before tossing a penny, an existent 
event after tossing. When the universe to be sampled consists 
entirely of completed events, the universe is said to be e^cist- 
ent; when it consists entirely or partly of events yet to come, 
it is said to be hypothetical. Prediction, with which social 
science must be concerned, is of course possible only with 
respect to hypothetical universes, since we do not ‘‘predict^' 
past events. 

It is also important to notice whether the universe to be 
sampled is to be regarded as a unique^ historical set of events 
(situation), as a constant or recurrent situation or system of 
causes, or as a changing situation. If we are interested in the 
death rate from the influenza epidemic of 1918, we have a unique 
universe. But if we attempt to predict the rate of mortality 
in Chicago, we assume a continuous or recurrent, i,e,y essentially 
unchanging, universe. As a matter of fact, strictly continuous 
or recurrent universes never occur in social research, since there 
is constant change in the complex of factors that compose any 
social situation. The important question, therefore, is whether 
the universe can be expected to be approximately recurrent, or 
unchanged, over a period in which we are interested. If so, we 
may be justified in trying to predict what will happen in that 



SAMPLim AND SAMPLING ERRORS 


223 


period on the strength of what has occurred. It is sometimes 
possible to discover the nature, direction, and rate of change in 
a changing universe, so that we can allow for it in making a 
prediction. 

Finally, we shall find it worth while to distinguish between 
homogeneous and heterogeneous hypothetical universes. A uni¬ 
verse is homogeneous when each hypothetical event has the 
same a priori probability of becoming a success or a specified 
value of a variable; it is heterogeneous when this probability 
is not the same for each hypothetical event. A homogeneous 
universe derives from a single set or system of causes, a heteroge¬ 
neous universe from two or more distinct sets of causes, as judged 
by their effects on the hypothetical events in which we are inter¬ 
ested. When an insurance company sets up a class of “risks,^' 
composed of, say, males, native white, married, in the legal 
profession, aged 25, class medical examination, living in 

Michigan, the company is trying to create a homogeneous 
universe. Every person or hypothetical event admitted to the 
class must be judged alike in respect to certain characteristics 
that arc believed to be related to the event, death. In other 
words, each member of the class must have the same apparent 
chance of death. In this way, and by requiring that the condi¬ 
tions of life for the class must go on essentially in the future 
as in the past (e.g., in case one of the insured persons enlists in a 
war, his contract may be modified or canceled), some likelihood 
is created that the system of causes affecting the mortality rate 
of the class will continue each year about the same as the year 
before, except for chance factors. If, however, a number of 
men aged 65 were to be admitted to the risk class originally 
composed of men aged 25, heterogeneity in the hypothetical 
events would at once be introduced. While such a mixed or 
heterogeneous universe might be recurrent if the proportion 
of the two ages were kept constant, it could no longer claim to 
be homogeneous, because the chance of death is known to be 
different for a man aged 25 and a man aged 65. In practice, 
just when a hypothetical universe may be considered homo¬ 
geneous is a matter of information and of degree. Of course, no 
two persons in a life-table category actually have exactly the 
same chance of death. The more completely the causes that 
are related to the success are controlled and equated from event 



224 


ELEMENTARY SOCIAL STATISTICS 


to event, however, the more accurate and reliable the prediction 
from the sample will tend to be, within limits. Where to stop 
the effort to increase homogeneity is a question of judgment and 
expediency. The more homogeneous the categories of any 
classification are made, the greater their number, and the fewer 
the events that will fall in any one category. While it is usually 
advisable to sacrifice the size of the sample for the sake of homo¬ 
geneity to a certain point, diminishing returns set in if the idea 
is carried too far. 

2. Taking the Sample.—The events in a sample may be drawn 
from the universe (1) at random, (2) at regular intervals, (3) 
at random from different strata or subclasses of the universe, or 
(4) according to some purposive scheme, such as from the middle 
and ends of a distribution. Thus, (1) we might draw marriage 
certificates at random from an alphabetical list of all of the 
marriage certificates in a file, (2) we might take every fifth 
certificate in order, (3) we might draw a proportional number of 
certificates at random from each separate county and city list, or 
(4) we might take certificates from the top, middle, and bottom 
of the list. The most common method of taking a sample, and 
the one to which most of the statistical theory of sampling applies, 
is the random. The method of sampling at random propor¬ 
tionally from within strata— e,g., marriage certificates taken at 
random from each county file—^is more representative than 
random sampling from the total universe— e.g,^ certificates 
taken from a grand list. Unfortunately, however, the sampling 
errors of only a few statistics are available in the case of stratified 
sampling. Purposive sampling is seldom as reliable as either 
of the other two methods, and difficulties of determining sampling 
errors are encountered. We shall deal here primarily with 
random sampling, but shall introduce stratified sampling for a 
mean and for a proportion. 

A sample is random when at any given draw or trial, considered 
alone, every existent event has an equal chance of being taken, or 
every hypothetical event is equally likely to occur. In other 
words, in a random sample the chance of being ‘‘drawn'' or 
“thrown" is independent of the character of the event. In 
addition, a simple sample—also called a Bernoulli sample, after 
a French mathematician who studied it—^requires that the 
probability of drawing or throwing a success or a specified value 



SAMPLING AND SAMPLING ERRORS 


225 


shall remain the same from one draw or trial to another.^ It is 
theoretically possible to random sample any universe, but a 
simple sample can be drawn only from an infinite universe. 
Suppose we have an existent universe of 1,000 marriage cer¬ 
tificates and wish to take a random sample of 100. Suppose, 
further, that 60 of these marriages have ended in divorce. At 
the first draw, the probability of taking a marriage that has 
ended in divorce is or 0.060. If we happen to draw a 

divorced marriage at the first trial, the probability of getting 
a divorced marriage at the second draw vdll be or 0.059; 
otherwise it will be or 0.06006. Now if the certificates are 
drawn entirely independently of the question of divorce, at any 
draw one certificate will have as good a chance of being taken 
as any other in the file, and the resulting sample will be random. 
But because the probability of drawing a given event, say, a 
divorced marriage, changes from one draw to the next in the 
limited universe of 1,000 certificates, the sample will not be 
simple. If, however, after each draw of a certificate from 
the file of 1,000, its number is recorded in the sample and the 
certificate is returned to the file, the number of certificates in 
the file will remain constant and the probability of drawing a 
divorced marriage will not change from draw to draw. By the 
act of replacement the universe becomes infinite. Of course, if 
we happen to draw the same certificate more than once, it will 
have to be accepted in the sample each time it is drawn, if it 
is wanted to maintain an infinite universe. 

In the case of a hypothetical universe, a simple sample of 
hypothetical events can be drawn only if the universe is homo¬ 
geneous, like a life insurance risk group. All the causes that 
determine the chance of death that the actuaries have been 
able to consider must be the same for each individual in the 
group. If a random sample were drawn from a mixture of 
two different risk groups, so that, say, persons of different 
sexes were included in the sample, the chance of death w’ould 
not be the same from one hypothetical event to another (person 
to person), and the sample would be further removed from a 
simple sandple. 

^ But it is not assumed that the probability is the same for different kinds 
of events or different values, e.y., a person of age 25 and a person of age 30 
in a imiverse of ages. 



226 


ELEMENTARY SOCIAL STATISTICS 


It follows from the preceding definitions that a simple or 
random sample of an existent universe is not necessarily a 
simple or random sample of the hypothetical universe from 
which the existent universe was derived. But when the existent 
universe itself is a simple or random sample of a hypothetical 
universe, a simple random sample of the existent universe will 
be a simple or random sample of that hypothetical universe also. 

In practice, it is not easy to obtain a random sample. The 
most manageable case is that of a limited existent universe each 
of whose events can be individually identified, such as the list 
of marriage certificates mentioned above. If we take certificates 
or pages of certificates at regular intervals from an alphabetical 
or other random list, say every twentieth certificate or page, the 
first page being chosen at random, the sample should apparently 
be random, because there is no obvious connection between this 
order and the information on the marriage certificates. If 
the interval is not too large, this method should also be more 
representative than other types of random sampling, since 
it takes certificates proportionately from every part of the 
list. There are many other devices for taking a random sample 
from a list. One of the commonest is to number the items, 
place corresponding numbers on tickets in a box, shufile them, 
and draw. Experience has shown, however, that methods 
like the above will not always yield a random sample. Mathe¬ 
matically, the ideal plan is to draw the sample from a table of 
random numbers, such as L. H. C. Tippett’s Random Sampling 
Numbers,^ which are combinations of digits taken at random 
from census reports. A specimen page of these numbers is 
shown below (Fig. 51). 

Imagine that we wish to take a random sample of 200 marriage 
certificates from a list of certificates in a state file. The cer¬ 
tificates are filed and numbered consecutively, so that any nth 
certificate from the beginning of the list can be quickly located. 
The smallest number printed in the table is 0,000 and the largest 
is 9,999. If the number of events in the universe is close to 
10,000, we can simply go down each column of four figures in 
the table, taking for our sample the first 200 numbers within 
the range of our universe that we meet. When the universe is 

^ TracUfor CompvierB, Number XV, Cambridge University Press, London, 
1927. 



SAMPLING AND SAMPLING ERRORS 


227 


lo CO oo o) o iH 04 eolo CO »«qoo>q th eoto co t-oo a»o«^e9 

,H 1-4 I-I r-i *H ^ <-i i-l kH $4 04 CM M Cl 04 04 04 04 M CO CO 0» 


2 

9 

5 

2 

6 

6 

4 

1 

3 

9 

9 

2 

9 

7 

9 

2 

7 

9 

7 

9 

5 

9 

1 

1 

3 

1 

7 

0 

5 

6 2 

4 

4 

1 

6 

7 

9 

5 

2 

4 

1 

5 

4 

5 

1 

3 

9 

6 

7 

2 

0 

3 

5 

3 

5 

6 

1 

3 

0 

0 

2 

6 9 

3 

2 

7 

3 

0 

7 

4 

8 

3 

3 

4 

0 

8 

2 

7 

0 

2 

3 

5 

6 

3 

1 

0 

8 

9 

6 

9 

1 

3 

7 

6 9 

1 

0 

5 

6 

0 

5 

2 

4 

6 

1 

1 

1 

2 

6 

1 

0 

7 

6 

0 

0 

8 

8 

1 

2 

6 

4 

2 

3 

3 

8 

7 7 

6 

2 

7 

5 

4 

9 

1 

4 

3 

1 

4 

0 

5 

9 

0 

2 

5 

7 0 

0 

2 

6 

1 

1 

1 

8 

8 

1 

6 

6 

4 4 

6 

6 

8 

7 

0 

2 

8 

5 

9 

4 

9 

8 

8 

1 

6 

5 

8 

2 

9 

2 

2 

6 

1 

6 

6 

6 

0 

6 

9 

2 

7 6 3 

0 

2 

6 

3 

2 

4 

6 

6 

3 

3 

9 

8 

5 

4 

4 

0 

8 

7 

3 

8 

6 

0 

2 

8 

5 

0 

4 

8 

2 

6 8 3 

2 

0 

0 

2 

7 

8 

4 

0 

1 

6 

9 

0 

7 

5 

0 

5 

0 

4 

2 

3 

8 

4 

3 

0 

8 

7 

5 

9 

7 

1 0 

8 

9 

5 

6 

8 

2 

8 

3 

5 

9 

4 

2 

7 

3 

6 

6 

8 

2 

5 

9 

6 

8 

8 

2 

0 

1 

9 

5 

5 

6 

5 1 

5 

8 

2 

4 

3 

1 

5 

7 

9 

1 

9 

3 

0 

5 

0 

2 

6 

3 

4 

2 

6 

7 

0 

8 

8 

3 

9 

9 

1 

7 

1 6 

1 

5 

6 

6 

7 

3 

5 

1 

3 

9 

2 

7 

0 

6 

2 

9 

8 

6 

3 

9 

6 

7 

3 

0 

6 

7 

8 

9 

8 

7 

8 4 

2 

1 

0 

1 

8 

6 

8 

9 

1 

1 

2 

1 

2 

6 

5 

6 

3 

2 

2 

0 

1 

5 

0 

1 

3 

0 

7 

3 

0 

2 

4 0 

5 

6 

8 

4 

1 

5 

1 

1 

1 

5 

6 

8 

8 

3 

7 

7 

7 

7 

3 

5 

4 

3 

4 

3 

4 

8 

3 

3 

6 

6 

4 2 

4 

2 

0 

4 

1 

2 

2 

0 

7 

4 

8 

8 

9 

7 

3 

4 

6 

2 

8 

6 

5 

1 

5 

5 

0 

5 

9 

6 

0 

5 

4 7 

9 

5 

5 

6 

5 

4 

7 

6 

4 

2 

6 

1 

7 

5 

2 

8 

1 

1 

8 

7 

0 

6 

4 

9 

7 

5 

7 

4 

4 

9 

5 7 

6 

4 

5 

0 

8 

1 

8 0 

8 

3 

2 

8 

9 

3 

9 

9 

3 

9 

4 

8 

5 

4 

2 

4 

0 

2 

8 

3 

5 

9 

9 5 

5 

2 

1 

5 

2 

6 

4 

7 

3 

5 

6 

9 

2 

9 

3 

0 

9 

7 

6 

6 

1 

1 

6 

6 

8 

5 

4 

3 

1 

7 

6 5 

8 

6 

9 

1 

7 

4 

1 

1 

3 

7 

3 

4 

0 

6 

8 

5 

3 

1 

1 

7 

2 

7 

2 

2 

9 

1 

2 

7 

9 

5 

0 8 

5 

8 

2 

4 

1 

4 

1 

2 

4 

4 

1 

3 

1 

9 

5 

0 

0 

5 

6 

5 

7 

3 

9 

3 

2 

5 

9 

4 

2 

3 

3 1 

7 

7 

9 

1 

3 

3 

7 

0 

9 

5 

9 

4 

4 

9 

7 

6 

3 

2 

7 

5 

5 

4 

2 

1 

1 

4 

9 

9 

5 

8 

6 5 

7 

9 

3 

8 

5 

7 

1 

2 

5 

3 

2 

3 

0 

0 

7 

3 

7 

2 

9 

5 

7 

1 

0 

1 

3 

6 

3 

6 

9 

4 

4 9 

4 

3 

4 

3 

6 

6 

2 

9 

3 

6 

0 

2 

5 

9 

3 

8 

4 

3 

3 

4 

3 

1 

0 

7 

1 

1 

4 

6 

8 

4 

8 0 

1 

9 

0 

9 

4 

1 

6 

3 

4 

5 

0 

7 

0 

0 

6 

6 

4 

6 

5 

1 

0 

0 

9 

1 

8 

4 

6 

0 

1 

4 

2 9 

4 

9 

2 

2 

6 

9 

2 

9 

6 

2 

7 

9 

6 

7 

0 

9 

7 

4 

0 

5 

7 

2 

0 

7 

4 

6 

2 

9 

7 

2 

5 8 

7 

7 

7 

8 

1 

3 

7 

6 

0 

2 

8 9 

5 

7 

6 

5 

3 

0 

0 

9 

1 

7 

0 

1 

2 

1 

3 

0 

8 

1 

0 9 

6 

9 

7 

4 

2 

9 

6 

9 

4 

7 

3 

4 

7 

0 

0 

1 

7 

9 

6 

7 

2 

1 

8 

5 

0 

0 

1 

1 

6 

1 

8 9 

9 

9 

4 

2 

0 

9 

2 

1 

0 

8 

7 

8 

7 

9 

3 

7 

5 

4 

6 

6 

3 

0 

3 

9 

6 

6 

7 

1 

7 

5 

5 6 

2 

1 

1 

7 

9 

3 

5 

7 

1 

5 

9 

9 

2 

3 

0 

5 

9 

9 

0 

1 

5 

5 

6 

0 

8 

2 

3 

4 

8 

8 

1 4 

4 

0 

7 

0 

8 

4 

0 

1 

1 

4 

0 

5 

7 

1 

5 

5 

0 

1 

6 

7 

4 

1 

3 

7 

6 

5 

2 

4 

3 

4 

4 2 

7 

6 

3 

5 

0 

3 

9 

9 

6 

3 

7 

9 

5 

2 

1 

7 

6 

8 

1 

8 

2 

4 

5 

1 

4 

6 

3 

4 

9 

3 

4 8 

3 

1 

4 

1 

4 

7 

1 

5 

2 

3 

6 

6 

8 

1 

6 

3 

'i 

0 

6 

3 

8 

3 

4 

4 

3 

4 

4 

4 

0 

3 

0 8 6 

7 

0 

4 

1 

8 

9 

8 

5 

7 

0 

1 

1 

5 

6 

7 

6 

7 

5 

7 

0 

6 

6 

8 

5 

1 

7 

7 

6 

3 

1 6 

4 

3 

2 

4 

3 

2 

7 

8 

3 

0 

8 

4 

0 

9 

0 

5 

4 

8 

8 

6 

2 

5 

1 

7 

3 

8 

4 

3 

3 

9 

1 1 

7 

7 

9 

2 

2 

4 

9 

3 

1 

T) 

7 

5 

3 

C 

1 

6 

0 

6 

5 

6 

G 

8 

6 

0 

2 

3 

4 

2 

3 

9 

0 7 

4 

8 

7 

6 

9 

3 

5 

1 

3 

8 

9 

7 

6 

0 

7 

8 

0 

6 

3 

8 

2 

0 

0 

2 

9 

2 

6 

1 

9 

5 

9 8 

2 

2 

5 

1 

0 

7 

2 

7 

4 

8 

7 

4 

3 

0 

0 

0 

0 

1 

8 

5 

0 

2 

4 

0 

8 

3 

6 

0 

2 

5 

1 7 

9 

0 

2 

2 

4 

2 

4 

0 

4 

9 

8 

1 

1 

6 

6 

4 

1 

9 

7 

3 

2 

1 

6 

6 

2 

9 

1 

5 

8 

1 

4 0 

4 

3 

0 

0 

9 

8 

5 

1 

6 

7 

2 

4 

5 

9 

4 

0 

9 

2 

8 

4 

4 

0 

7 

1 

7 

1 

0 

7 

2 

3 

1 3 

7 

7 

4 

8 

9 

0 

2 

2 

1 

7 

9 

2 

1 

2 

3 

5 

1 

2 

6 

9 

6 

4 

9 

0 

6 

2 

4 

8 

4 

3 

8 6 

8 

5 

1 

8 

8 

1 

8 

2 

5 

2 

2 

2 

0 

9 

3 

8 

2 

0 

5 

3 

2 

1 

9 

1 

5 

1 

7 

9 

0 

2 

0 8 

1 

1 

1 

9 

8 

2 

5 

4 

5 

2 

4 

8 

2 

9 

6 

0 

7 

0 

0 

6 

7 

3 

7 

4 

4 

9 

8 

6 

6 

5 

0 9 

6 

3 

9 

0 

8 

4 

6 

7 

6 

7 

8 

1 

C 

6 

5 

1 

7 

9 

1 

2 

1 

3 

1 

7 

1 

4 

1 

1 

9 

3 

6 1 

5 

1 

0 

9 

4 

2 

2 

2 

3 

1 

6 

7 

6 

2 

2 

8 

2 

3 

7 

1 

2 

8 

1 

9 

1 

1 

3 

3 

0 

1 

4 5 

4 

1 

8 

1 

7 

7 

7 

2 

3 

5 

5 

8 

2 

7 

1 

6 

3 

9 

5 

1 

8 

0 

2 

3 

1 

7 

7 

8 

2 

5 

7 4 

2 

6 

2 

0 

8 

9 

5 

9 

8 

9 

6 

2 

3 

2 

1 

1 

4 

7 

7 

4 

7 

2 

0 

9 

6 

5 

0 

2 

7 

0 

5 6 

1 

( 

7 

5 

2 

4 

5 

1 

9 

2 

7 

4 

9 

8 

0 

2 

0 

4 

6 

4 

2 

1 

1 

9 

0 

7 

3 

0 

2 

8 3 5 

0 

0 

4 

8 

6 

6 

9 

9 

3 

3 

1 

1 

5 

5 

0 

2 

5 

4 

8 

8 

7 

1 

5 

7 

1 

9 

8 

1 

9 

6 

8 0 

4 

( 

9 

4 

2 

3 

0 

0 

4 

1 

4 

4 

2 

2 

8 

1 

0 

1 

4 

7 

9 

0 

9 

7 

0 

7 

3 

0 

2 

3 

7 7 

5 

4 

9 

3 

0 

9 

7 

8 

5 

7 

4 

6 

0 

3 

9 

9 

6 

2 

8 

6 

4 

0 

5 

5 

9 

3 

9 

8 

5 

8 0 9 

2 

2 

3 

4 

9 

1 

5 

9 

4 

7 

1 

5 

2 

0 

2 

5 

7 

4 

0 

4 

1 

4 

1 

0 

5 

3 

1 

8 0 

9 

8 0 

6 


Fiq. 61.—Specimen page of random sample numbers. (From TracU for 
Computers, Number XV, Random Sampling Numbers, ed. by Karl Pearson, 
arranged by L* H» C. Tippett, p. 1.) 



228 


ELEMENTARY SOCIAL STATISTICS 


much smaller than 10,000, it sometimes saves time to assign 
several table numbers to each event. For example, if there are 
only 2,000 events (e.gr., marriage certificates) in the universe, we 
may assign to event number 1 the table numbers 0 through 4, 
to event number 2 the table numbers 5 through 9, and so on. 
If then we read, say, the number 0061 from the table, we draw 
event number 13 from the universe + 1 = 13).^ The same 
event is accepted only once from a limited universe, regardless 
of how many times it may be drawn. Also, the number of digits 
to read in the table may correspond to the number of digits 
needed to express the total of events in the universe. Thus, if 
the universe contains 800 events, we may draw three-digit 
numbers, e.gr., 295, 016, 273, and so on; if the number of events is 
500,000, we may draw six-digit numbers, such as 295,266; 416^795, 
003,074. The table may be read in any direction or order. 

When the individual events of a limited universe cannot be 
identified and labeled, probably the next best thing is to identify 
groups of them, usually on a geographical-time basis. Thus a 
random sample of the farmers of a state, of whom there is no list, 
may be obtained by numbering each township in the state and 
drawing a random sample of townships by one of the methods 
suggested above. Then each township drawn in the sample 
may be visited at a given date, and all the farmers in it taken in 
the sample. Or, if necessary, the sample townships may be 
divided into school districts and a random sample of these dis¬ 
tricts drawn before going to the events (farmers) themselves. 
When this approach has to be used, the number of groups com¬ 
posing the universe should be as large as practicable, while the 
number of events in each group should be a minimum and 
nearly equal from group to group. For example, if the township 
is the smallest unit for which data are available, a township with 
a large population may be subdivided and represented by two or 
three tickets, instead of by one, in the drawing, so that the 
probability of drawing a township will be roughly proportional 
to the size of its population. 

In dealing with an infinite or a very large universe, it is of 
course not possible to list and label all the individual events, but 

^ In this case, to find the serial number, if the table number is not already 
an exact multiple of five, reduce it to the nearest multiple of five, divide by 
five, and add one. 



SAMPLING AND SAMPLING ERRORS 


229 


it may be feasible to use the group method mentioned in the 
preceding paragraph. For example, if a physical anthropologist 
wanted to sample the white race, he might divide the countries 
occupied by the various branches of this race into small geo¬ 
graphical areas, number them, and draw them at random. He 
would then probably have to go to each of the areas drawn, 
further subdivide them, draw a random sample of the small 
subdivisions, and then finally perhaps take a random sample of 
the individuals living in each subdivision. 

Such a plan as the above, however, is not adapted to a hypo¬ 
thetical universe, like the number of heads or tails that might be 
thrown with a penny, or the number of divorces that might occur 
in the United States over some future period of time. The only 
way to draw a random sample in this case is to define a set of 
conditions, or causal system (e.g., social conditions in the United 
States, Jan. 1, 1940 to Jan. 1, 1941), draw at random a number 
of hypothetical events that satisfy the conditions (couples 
married on Jan. 1, 1940), and let the system act to convert them 
to existent events (couples divorced, not divorced on Jan. 1, 
1941); or else wait until the system has produced a large number 
of existent events (couples married Jan. 1, 1940, after Jan. 1, 
1941), and then draw at random as many of them as are needed 
for the sample. In either case, if a simple sample is wanted, it 
is, of course, necessary to make sure that the existent events 
(couples divorced or not divorced Jan. 1, 1941) were derived 
from hypothetical events (couples married Jan. 1, 1940), each of 
which had (on Jan. 1, 1940) essentially the same a priori prob¬ 
ability of becoming a success (divorced couple) throughout 
the experiment (Jan. 1, 1940 to Jan. 1, 1941), except for chance 
factors. Notice also that a causal system that does not act 
uniformly over its time cycle must furnish sample events from 
the whole of its cycle, to avoid important omissions. For exam¬ 
ple, in determining the death rate of infants during the first year 
of life, observations should extend over the complete period of 
12 months, because the death rate is subject to seasonal variation. 

If a heterogeneous hypothetical universe, t.e., a hypothetical 
universe in which the chance of success is not the same from one 
hypothetical event to another (e.gf., a class of life insurance risks 
of different ages, where p is the probability of an individual’s 
death within the year, and a hypothetical event is a person taken 



230 


ELEMENTARY SOCIAL STATISTICS 


at the beginning of the year), exists without important change 
over a period of time, then a random sample drawn from a large 
number of the events at the end of this period will yield an esti¬ 
mate of the mean probability of death for the mixed class. 

When a universe is divided into strata with respect to some 
trait, a proportional^ simple subsample is taken from each stra¬ 
tum, and these subsamples are combined, the resulting total 
sample is called a Poisson sample, in honor of the French mathe¬ 
matician who described it. Thus, for the purpose of drawing 
a Poisson sample, an existent universe of family incomes in New 
York City in 1939 may be divided into the classes: Under $600, 
$500-$999, $1,000-$1,499, . . . ; or, supposing that we are 
interested in divorce, we may define a hypothetical universe of 
ever-married women in the city of Philadelphia on Jan. 1, 1940, 
consisting of subgroups whose members are alike in respect to 
occupation of husband, presence or absence of children, religious 
aflffliiation, length of time since marriage, and so on. If all the 
requirements of Poisson sampling are to be met, each stratum 
must constitute an infinite subuniverse. In the case of our 
existent universe of family incomes, this will be approximately 
true if the number of incomes in each class is very large. Any 
hypothetical universe or stratum may be regarded as infinite on 
the assumption that the defined set of conditions theoretically 
acts to produce events without limit. For example, it may be 
reasoned that the conditions that produced a certain percentage 
of divorces among a group of Philadelphia women whose hus¬ 
bands were skilled laborers, who had borne children, who were 
Protestants, who had been married five years, and so on, might 
continue indefinitely to produce the same percentage of divorces 
(except for random errors) among women of this description. 
As a rule, however, it is more realistic to consider how long we 
may expect a hypothetical universe actually to persist without 
important change, and then decide whether the probable num¬ 
ber of events that will be produced in a given stratum within 
that period may be regarded as infinite for practical purposes. 
To refer again to our illustration, we might conclude that the 
set of conditions responsible for the divorce rate observed in the 
class of women defined above would probably remain essentially 

»Preferably also weighted by the value of the standard deviation of the 
stratum or subgroup. 



SAMPLING AND SAMPLING ERRORS 


231 


the same no longer than perhaps a decade, but that in 10 years 
several thousand women would come within the class, a number 
great enough to be regarded as infinite without noticeable error. 

If a simple sample is drawn from only one of several strata 
forming a universe, and from it an attempt is made to judge the 
whole universe, the sample is called a Lexis sample, after a Ger¬ 
man statistician. The sampling error of a Poisson sample is 
less than that of a random sample, while that of a Lexis sample is 
greater. A Lexis sample is seldom taken intentionally, but may 
occur when some important part of the universe is omitted from 
the sample.^ The Poisson sample, on the other hand, is the 
most representative sample that can be taken of sociological 
data, and should be used much more than it now is. 

What has been said above about the sampling of an attribute 
(an unmeasured quality called an event, such as the survival or 
death of an insured person) applies equally to the sampling of a 
variable (a measured quality, such as the net annual income of a 
farm family). In the case of sampling a variable, the parameter 
in which we are usually interested is the mean of the values in the 
universe (e.gf., the mean net annual income of the farmers in 
Nebraska), although it may be the standard deviation, a cor¬ 
relation coefficient, or other index. 

When the purpose in taking a sample is to use the proportion, 
mean, or other statistic from the sample as an estimate of the 
corresponding parameter in the universe, it is needed to know the 
range of error in the estimate due to sampling. This can be 
found only if the sample is approximately of some standard type, 
such as random, simple, or Poisson. Thus, if we find from a 
simple sample of juvenile delinquents that 21 per cent were 
from broken homes, we are able to estimate with the aid of the 
mathematical theory of sampling that the chances are, say, 19 
to 1 that certain limits, say 15 to 27, will enclose the true per¬ 
centage from broken homes in the universe from which the sample 
was taken. If the nature o" the sample is uncertain, so that we do 
not know that it is, say, simple or Poisson, we cannot apply the 
appropriate formulas for finding the errors of sampling, and so 
cannot gauge the amount of error in any statistic estimated from 

^ See Bruce D. Mudgett and S. R. Gevorkiantz, Reliability of Forest 
Surveys, Journal of the American Statistical Association, Vol. 29, pp. 257-281, 
1934. 



232 


ELEMENTARY SOCIAL STATISTICS 


the sample. The chief assurance that we can have about the 
nature of a sample must come from a knowledge of the method 
by which it was taken. Thus, we must know that the conditions 
of, say, simple sampling were at least broadly met in drawing the 
sample before we can safely treat it as a simple sample. 

3. The General Theory of Sampling.—In general, the theory 
of sampling that provides a basis for the measurement of sampling 
error is as follows. Suppose that we draw a large number, N, 
of random samples of equal size, n, from a universe of juvenile 
delinquents, and list the number of delinquents with broken 
homes (successes) in each sample. We shall then have a table 
like Table 66. 

This is a sampling distribution of the number or frequency of 
successes per sample. We may find its standard deviation by 
the familiar formula. 



where X is the number of successes per sample, Af* is the mean 
number of successes per sample,/is the number of samples having 
a given number of successes, and N is the total number of 
samples. Since this is the standard deviation of the number of 
successes from many actual samples, we may call it an empirical 
standard errory to distinguish it from the standard deviation of a 
series where the question of sampling does not enter. We may 
further call the standard error of this formula the empirical 
standard error of the number of successes per sample, to differentiate 
it from the standard error of, say, a mean or correlation 
coefficient. 

An empirical standard error like the above has the disad¬ 
vantage, however, that it is itself a sample value that is affected 
by the number of samples taken, and varies because of random 
errors of sampling. Mathematicians are able to calculate a more 
exact or theoretical standard error provided they are allowed to 
specify the nature of the distribution of the universe values and 
the conditions under which the sample is taken. This enables 
them to lay down requirements which ensure that the parameter. 


, ;40r probable error, if preferred. From Chap. IX. pp. 160 and 161, it will 
be recalled that the probable error is related to the standard error by the 
equation P,E, ■■ .6745^, where o' «» t in our subsequent notation. 



SAMPLING AND SAMPLING ERRORS 


23S 


say, a frequency, will be distributed in the samples according to 
some established mathematical principle, such as the binomial 
theorem or the normal curve. 

Table 66.— Distribution op Broken Homes per Sample op 60 Juveniijb 


Delinquents in 

100 Random Samples 

Broken Homes 
per Sample 

Samples 

0 

0 

1 

0 

2 

0 

8 

1 

9 

1 

10 

2 

11 

5 

12 

9 

13 

10 

14 

12 

15 

12 

16 

11 

17 

8 

18 

9 

19 

5 

20 

6 

21 

4 

22 

3 

23 

1 

24 

0 

25 

1 

26 

0 

27 

0 


48 0 


49 

0 

50 

0 

Total. 

. 100 


Some of the commonest of the standard error formulas that 
are applied in the sampling of attributes assume that the sampling 
distribution is binomial in type. It will be recalled^ that N times 
‘ See Chap. IX. 




234 


ELEMENTARY SOCIAL STATISTICS 


the binomial expansion shows how many of N random samples of 
n events each may be expected to have given numbers of suc¬ 
cesses from 0 to n, where the probability of success remains the 
same from event to event. If it can be shown that these require¬ 
ments were at least approximately complied with in drawing the 
events of a sample, we may assume that the sampling distribution 
will be approximately binomial in form. The standard deviation 
of the binomial is well known, and is then the theoretical standard 
error that we are seeking: 

fff = Vn^, 

where p is the constant chance of success in the binomial universe, 
g = 1 — p, n is the number of events in a single sample, and 
/ is the frequency. We have only to substitute in this formula to 
get the standard deviation (called standard error) of the sampling 
distribution. 

In the investigation of sociological attributes, however, there 
is usually available only one sample, rather than a distribution 
of many samples. In that case, if the sample was taken under 
binomial conditions, and its size is large, the best estimate of p, 
the proportion of successes in the universe, is the proportion of 
successes in the single sample. This estimate of p is then used 
in the above formula to compute an approximate theoretical 
sampling error. 

It will be noticed that the binomial theorem merely repeats 
the requirements of the simple sampling of attributes, which we 
have seen can be met only if an existent universe is infinite and 
well mixed, or a hypothetical universe is homogeneous. Because 
of the difficulties in taking a simple sample under many conditions 
in sociological research, it is fortunate that the standard errors 
of simple samples are usually not very different from those of 
random samples, and in any case are somewhat larger. For 
these reasons, investigators often apply simple sampling errors 
to random or even stratified samples, in order to save labor or 
to be on the conservative side when in doubt as to what the 
error formula should be. 

4, Only Large Samples Considered.—In sociological inves¬ 
tigations, many factors that the sociologist is unable to control 
usually cause small samples to differ radically from one another. 
Small samples are, therefore, not often used in social research. 



SAMPLim AND SAMPLING ERRORS 


235 


For this practical reason and for the sake of simplicity, the discus¬ 
sion in this book is limited to large samples. As a rule, the 
standard error formulas given may become rather seriously 
inaccurate if applied to samples with fewer than 20 to 25 items, 
and are safest when used with much larger samples.^ 

6. Standard Error Formulas, a. The Standard Error of a 
Frequency .—As just shown, for a simple sample, the formula 
for the standard error of a frequency is 



where p is the constant probability of drawing a success at any 
single draw, g = 1 — p, and / is the frequency in question, 
p refers to the probability in the universe, t.e., to the true or 
expected probability, and / to the true frequency, but they are 
usually estimated from the sample when the latter is large. 

We shall illustrate the use of this formula by application 
to the age distribution of an approximately simple sample of 
unemployed in New York City in 1930, shown in Table 67. 


Table 67.—The Distribution of Unemployed Persons by Age, in a 
Simple Sample of 100 Unemployed in New York City, 1930 


Age, years 

Unemployed 

(/) 

15-24. 

28 

25-34. 

23 

35-44. 

21 

45-54. 

16 

55-64. 

9 

65 and over. 

3 

Total. 

100 

i 



What is the range of error in the sample estimate of the relative 
number of unemployed in the age class 15-24? If we assume 
that the universe of the unemployed in New York is existent 
and large enough to be regarded as infinite for our purposes, and 
that it does not change appreciably during the process of sam¬ 
pling, then the probability of drawing an unemployed person 
in the age class 15-24 should be constant from draw to draw, 
’ See Sec. 6. 














236 ELEMENTARY SOCIAL STATISTICS 

and from one sample of 100 persons to another. Thus the 
requirements of simple sampling are met, and we may determine 
the error of sampling of the frequency by formula (113). Since 
n is as large as 100, we accept 28 as an estimate of the true 
frequency in the age class 15-24. Substituting in formula (113), 

€88 = v^28(l — 1 ^), 

€28 “ 4.5. 

In a large number of such samples the frequency in the age 
class 15-24 is approximately normally distributed, and the 
standard error just found is an estimate of the standard deviation 
of that normal distribution. Under these conditions, about two 
times in three the true frequency in the age class 15-24 will be 


Y 



the age class 15-24 years will be enclosed about 19 times out of 20 (in the long 
run), as determined from a simple sample of 100 unemployed in New York 
City, 1930. 

included within one standard error above and below the sample 
frequency of 28. That is, about two times out of three, we 
should expect the true number of unemployed persons in the age 
class 15-24 to be contained between the limits 28 ± 4.5, or 
between 23.5 and 32.5. If we want more security than this, we 
may multiply the standard error by two, getting limits of 
28 ± 2(4.5), or 19 to 37, within which about 19 times in 20 the 
true frequency will be found. ^ To attain practical certainty, we 
may multiply the error by three, giving chances of about 369 to 1 
that the true frequency is enclosed between 28 ± 3(4.5), or 
between 14.5 and 41.5. Usually, a range of twice the standard 

^ See Appendix Table 1. 





SAMPLim AND SAMPLING ERRORS 


337 


error is regarded as safe enough. In case this range, here 
19 to 37, seems too wide to be of much value, and it is wanted 
to narrow it, the size of the sample must necessarily be increased, 
since the size of the sampling error varies directly as -s/n (see 
Sec. 6). 

Evidently, if the size of the sample is decreased, the relative 
range of the sampling error increases, so that one reason why a 
small sample is not suitable for estimating the value of a param¬ 
eter is easily seen. For example, suppose that the number of 
persons in the sample of Table 67 is only 10, and the frequency 
in the age class 15-24 is 3. Substituting in formula (113), with 
a factor n/(n — 1) = inserted as a correction for the small 
size of the sample, 

€8 = \/V[3(l — 1^)], 

€3 = 1.53. 

We no longer have confidence in the sample frequency as an 
estimate of the universe frequency for use in the formula, but 
disregarding this, we find the range of twice the standard error 
to bo 3 ± 2(1.53) = —0.06 to 6.06, or approximately 0 to 6. 
The ratio of twice the error to the frequency is now 3.06/3 = 1.02, 
as compared with = 0.32 for the larger sample. 

If it is known that a sample was taken under Poisson condi¬ 
tions from a stratified universe, the standard error of a frequency 
estimated from it may be obtained by the formula 

€/2 = 7ipq - n(Tp.\ (114) 

where pj is the proportion of successes in any stratum, j, of the 
universe; p is the mean of the p/s; (Tp/ is the variance of the 
p/s: 



and k is the number of the strata. As in the case of formula 
(113), if the universe values of these statistics are not known, they 
are comilionly estimated from the sample, provided the fre¬ 
quency in each stratum of the sample is fairly large (say, 50 or 
more). 



238 


ELEMENTARY SOCIAL STATISTICS 


' Table 68.—One Hundred Unemployed Persons in New York City, 
1930, Classified by Color and Nativity 


Age, years 

Number of unemployed 

Total 

Native white 

(1) 

Foreign-born 

white 

(2) 

Negro 

(3) 

15-24. 

28 

5 

18 

5 

25-34. 

23 

8 

13 

3 

35-44. 

21 

9 

7 

5 

45-54. 

16 

9 

1 

3 

55-64. 

9 

8 

0 

3 

65 and over.. 

3 


0 

0 

Total. 

100 

HI 

39 

19 


Assume that Table 68 above is a Poisson sample, drawn as 
previously described, the strata being native white {j = 1), 
foreign-born white {j = 2), and Negro 0 = 3), as shown in the 
table. Let ni = 42, = 39, and na = 19; and let the numbers 

in each stratum falling in the age class 15-24 be /i = 5, /2 = 18, 
and/a = 5, giving = A = 0*12, p 2 = = 0.46, and 

P3 = = 0.26. 

Then p = = 0.28, g = l—p=l — 0.28 = 0.72, and from 

formula (115) 

dp/ = [42(.12)2 + 39(.46)2 + 19(.26)2]/100 - (.28)^ = 0.023. 

Substituting in formula (114), 

€/ = 100(.28)(.72) - 100(.023), 

6/2 = 17.86, 

€f ~ 4.23. 

Notice that this error is slightly smaller than that found on the 
assumption that Table 67 represented a simple sample. 

It may be objected that in Table 68 the frequencies in cols. (1), 
(2), and (3) are not large enough to yield very good estimates 
of the true values in the universe. 

6. The Standard Error of a Proportion ,—In dealing with Table 
67, above, as a simple sample we may think of the frequency 28 
in the age class 15-24 as a proportion of the total frequency 100, 




















SAMPLING AND SAMPLING ERRORS 239 


p = = 0.28, and use formula (116) to find the standard 

error of this proportion: 


Substituting in (116), 



€.28 = 0.045. 


Therefore the proportion of unemployed persons in the age 
class 15“24 estimated from the sample and the range of error 
of the true proportion may be written 0.28 ± 0.045. This means 
that the chances are two to one that the true proportion, or 
parameter, is not less than 0.235 or more than 0.325. 

If we suppose, as we did in the preceding section, that Table 68 
is a Poisson sample, the formula for the standard error of a 
proportion is 


€ 


2 


P 


Vi - ZiL. 

n n 


(117) 


Using the same values as for formula (114), we have, 

, _ (.28) (.72) .023 

100 lOO’ 

= 0.001786, 

€.28 ” 0.0423, 


which is again smaller than the standard error of the same pro¬ 
portion estimated from Table 67 regarded as a simple sample. 

c. The Standard Error of an Arithmetic Mean .—Even when the 
universe departs considerably from normality, the means of large 
samples tend themselves to be normally distributed. 

Formula (118) gives the standard error of the arithmetic mean 
found from a simple sample: 



(118) 


* Just as ^ frequency is changed to a proportional frequency by dividing 
it by n, so the standard error of the former [formula (113)] is changed into 
the standard error of the latter [formula (116)] in the same way: 




1 

n 





240 


ELEMENTARY SOCIAL STATISTICS 


where N is the total frequency of the table or the sample, and <r is 
the standard deviation of the universe, estimated from the 
sample. 

The mean of the simple sample of Table 67, taking the mid¬ 
point of the open interval at 70, is 


Af = A + z • 



M = 40 + lO(-AV) = 36.4. 


The standard deviation is 


c = 


c = 

<r = 



14.2. 


Substituting these values in formula (118), 

14.2 

€m = — — y 

Vioo 

= 1.42. 

We therefore write for the mean and its standard error 


36.4 ± 1.42. 


For a Poisson sample, the standard error of the mean is given 
by the formula 




-£! 

~ N N’ 


(119) 


where is the variance of the universe estimated from the total 
sample, and 




X 

N 


- M\ 


( 120 ) 


where m/ is the mean of the jth stratum. As usual, all statistics 
are estimated from the sample when the true values in the 
universe are not known. 

Referring to Table 68, we found that 

= (14.2)* - 201.64, 



SAMPLim AND SAMPLING ERRORS 


241 


and Af* = (36.4)* = 1,324.96. Let the mean age of the native 
whites be mi = 44, of the foreign-bom whites m 2 = 27.7, and 
of the Negroes m 3 = 38. We compute 

= 42(44). +_39(g|^7)- + 19(3g)- _ j 
= 61.76. 

Substituting in formula (119), 


, _ 201.64 61.76 

100 100 

CAT = 1.18. 


1.40, 


As before, we see that the standard error of the Poisson sample 
is smaller than that of the simple sample. 

The standard error of the mean is most useful in testing the 
significance of the difference between two means, to be treated 
later. 

d. Standard Error of the Standard Deviation .—For a simple 
sample drawn from an approximately normally distributed 
universe, the standard error of a standard deviation is 




( 121 ) 


where <r is the standard deviation of the universe, estimated 
from the sample. 


Table 69.—Scores of 100 Communities on a Community Organization 

Test 


Score (X) 

Commu¬ 

nities 

(/) 

d 

fd 

1 

/d* 

Accumu¬ 
lated fre¬ 
quency 

X-M 

}(X - M) 

80-99 

■■ 

2 

18 

36 

100 

41.4 

372.6 

60-79 


1 

17 

17 


21.4 

363.8 

40-59 

43 

0 

0 



1.4 

60.2 

20-39 

20 

-1 

-20 

20 


18.6 

372.0 

0-19 

11 

-2 

-22 

44 

11 

38.6 

424.6 

Total... 

100 


- 7 

117 



1,593.2 


Table 67 is a J-shaped rather than a normal distribution, so 
it does not lend itself to formula ( 121 ). We shall, however, 
risk applying the formula to Table 69, which is only moderately 



















242 


ELEMENTARY SOCIAL STATISTICS 


skewed. The 100 communities were taken at random from the 
total of some 300 cities of a given size class in the United States, 
the name of each community taken being replaced before the 
next draw. The sample may, therefore, be regarded as a simple 
sample, representing an infinite existent universe of cities like 
the 300 cities reported by the census. 

The standard deviation of Table 69 is 


“Vim-(tto)’ 

So that, by formula (121), 


21.6 

vwm 


1.53. 


And we write for the standard deviation and its standard error 
21.6 ± 1.53. 

c. Standard Error of a Variance, —Assuming as before a simple 
sample from an approximately normal universe, the variance, 
or^, has the standard error, 

= ( 122 ) 


The variance of Table 69 is (21,6)2 = 466.56, and its standard 
error is 


= 466.56 


65.78. 


/. Standard Errors of Sampling from a Limited Universe, —A 
great part of the sampling done in social research is from limited 
rather than from infinite universes. It has already been seen 
that from a limited universe a random sample can be drawn, but 
a simple sample cannot. All the formulas for finding the stand¬ 
ard errors of a simple sample given above, therefore, need a 
correction if the sample is drawn at random from a limited 
universe, i.e., if the sample is random but not simple. In the 
case of a mean, frequency, or proporti on, the corr ection consists 
in applying the multiplying factor, y/\U — s)/U, to the stand¬ 
ard error of a simple sample, where U is the number of events 
in the limited universe, and s is the number of events in the 
sample, so that « = n or iV in our formulas above. It is not 
certain that this correction is applicable to standard errors 
other than those mentioned. 



SAMPLING AND SAMPLING ERRORS 


243 


One or two illustra tions will suffice to show how the correction 
factor ^/{U — s)/U is used. In the section dealing with the 
standard error of a frequency for Table 67, we found €23 = 4.6. 
Now if we regard the universe of the unemployed in New York 
City from which this sample of 100 was drawn as a limited 
universe, consisting in the year 1930 of an average of 300,000 
persons, we have 


Ku^) _ f 

\ u “ V‘ 


(300,000 - 100) 
300,000 


= 0.9997. 


Multiplying this into the standard error found by assuming an 
infinite universe we get .9997(4.5) = 4.499, which for all practical 
purposes is the same as before. This suggests that when the 
limited universe is quite large there is no need to make the 
correction. 

Suppose from a limited universe of 1,382 divorces granted in a 
certain court in 1939, a random sample of 200 is drawn. From 
this sample the mean legal cost of getting a divorce is found 
to be $136, and the standard deviation $32. If we regard the 
universe as limited, the standard error of $136 is obtained by 
multiplying the corrective factor into formula (118), the standard 
error of the mean of an infinite sample: 


4 


jU-s) 

U 


\/N 



1,382 - 200 
1,382 



.925(2.26) = 2.09. 


In this case, the correction for a limited universe reduces the 
standard error of the mean over 7.0 per cent. 

g. Standard Errors When the Unit of Sampling Is a Group of 
EventSj or District; and the Standard Error of a Population Rate ,— 
When a sample of districts, instead of individual events, is 
taken, the district simply replaces the individual event in the 
appropriate standard error formula. That is, n or N becomes 
the number of districts, rather than the number of events. 
The proper standard error formula to use in any given case 
depends as before on the conditions under which the sample was 
drawn. However, only those standard error formulas are 
appropriate for districts that apply to variables, because, disre¬ 
garding sampling errors within districts, each district is merely 
one value of a variable, such as a proportion or mean, determined 
by the events within the district. In finding the mean, the 



244 


ELEMENTARY SOCIAL STATISTICS 


variance, and so on, of the district values, it is usually advisable 
to weight the latter by the number of events in the respective 
districts. 


Table 70.— Birth Rates in 20 Counties op Wisconsin, 1936 


County 

Birthrate — 
(Pi) 

Population, 
1930 - (Yi) 

Products «• 

(YiPi) 

Squares — 
(P.P.*) 

1 

.0186 

8,003 

148.856 

2.768,718 

2 

.0222 

21,054 

467.399 

10.376,253 

3 

.0184 

34,301 

631.138 

11.612,947 

4 

.0125 

15,006 

187.575 

2.344,688 

5 

.0221 

70,249 

1,552.503 

34.310,314 

6 

.0175 

15,330 

268.275 

4.694,813 

7 

.0172 

10,233 

176.008 

3.027,331 

8 

.0157 

16,848 

264.514 

4.152,864 

9 

.0205 

37,342 

765.511 

15.692,976 

10 

.0173 

34,165 

691.055 

10.225,243 

11 

.0174 

30,503 

530.752 

9.235,088 

12 

.0225 

16,781 

377.573 

8.495,381 

13 

.0171 

112,737 

1,927.803 

32.965,426 

14 

.0144 

52,092 

750.125 

10.801,797 

15 

.0208 

18,182 

378.186 

7.866,260 

16 

.0162 

46,583 

754.645 

12.225,243 

17 

.0187 

27,037 

605.592 

9.454,569 

18 

.0220 

81,087 

903.914 

19.886,108 

19 

.0178 

3,768 

67.070 

1.193,853 

20 « n 

.0173 

59,883 

1,035.976 

17.922,383 , 

Total. 


671,184 

12,284.470 

229.252,255 


In Table 70 is a random sample of 20 counties of Wisconsin, 
showing their birth rates (p< = Xi/Yi, where Xi = births) in 
1935. The mean birth rate for the table is 


and the variance is 


229.252,255 /l2,284.470Y 

671,184 \ 671,184 ) 

ffp = V.O(X)00667 = .0025826. 

(124) 


Xy, \Xy<' 

= .00000667, so that 


_ 12,284.470 
671,184 


= .01830, 


(123) 











SAMPLINQ AND SAMPLING ERRORS 


245 


By formula (118), 


eu = 


.0025826 


.0005775. 


Or, combining formulas (124) and (118), and adding a term 
due to errors of sampling within a district, the standard error 
of a population rate is approximately 




21 *’ 



Thus, for Table 70, 



(125) ‘ 


Q* = (.0005775)= + 

= .0000003335 + = .0000003335 

O71,lo4 

+ .000000026766 = .000000360266, 

€1, = 0.0006. 


Since we think of the 71 counties of Wisconsin in 1935 as a 


limited u niverse of birt h rat es, we should apply the correction* 
factor, \/(71 — 20)/71 = \/-7183, giving 


,0006(.8475) = 0.000509 


as the final standard error. We therefore write the mean birth 
rate and its standard error: 0.0183 ± 0.00051, or multiplied by 


^ More exactly, the last term is 


^y\ VyJ 


671,184 V 


0183 


229.252,255> 
671,184 J 


0.000000026756. 


When the population is large, however, this term is usually negligible. 
* Or, using population weights, the correction factor is 



where N is the number of districts in the universe and n is the number of 
districts in the sample. 



246 


ELEMENTARY SOCIAL STATISTICS 


1,000, 18.30 ± 0.51. The chances are 19 in 20 that the birth 
rate per 1,000 for the state as a whole will be enclosed within 
the sample range 18.30 ± 2(.51) = 17.28 to 19.32. As a matter 
of fact, the birth rate for Wisconsin in 1935 was 17.3, almost at 
the bottom limit. This is because the city of Milwaukee, with 
a very low birth rate of 15.0, happened to be left out of the sample. 
The birth rate for the state without Milwaukee was 18.09, which 
is well within the estimated range of sampling error. 

This case illustrates one of the dangers of sampling by dis¬ 
tricts, namely, that the sample may omit a district with an 
extreme value and a very large number of events. This is avoided 
when the events are sampled directly. In the case of birth 
rates and other population rates, sampling by districts is unavoid¬ 
able, but counties like Milwaukee should be subdivided into 
several average-sized population districts, each with the given 
Milwaukee birth rate. Then the chance of such an omission 
from the sample is lessened. 

It was assumed above that all the events in each sample 
district were used to determine the district value. Sometimes 
it is necessary to sample the events in sample districts. This 
might be the case if we wanted to study a few hundred farmers^ 
household accounts in a given state. We would probably draw 
a sample of counties, but could not get accounts from all of the 
farmers in a sample county. This random sampling of events 
would increase the sampling error within the districts. For 
example, in formulas (123), (124), and (125), Yi would become y*, 
where yi is the size of the sample population drawn from district 
i, on which the birth rate, pi is calculated. 

6. Control of Sampling Error by Size of Sample.—The number 
of items that a sample should contain to yield a satisfactorily 
accurate picture of the distribution in the universe from which 
it is taken depends on the number of different kinds or classes of 
items that it is necessary to distinguish (f.e., on the heterogeneity) 
in the universe, on the relative frequencies of the items in each 
class, and on whether the items are stratified or mixed. This may 
be explained by a simple illustration. 

If the universe is limited and consists of two individuals, a 
white and a Negro, who are to be examined for skin color, 
evidently the sample will fall short of giving a proper picture 
of the universe, or of being representaiive^ unless it contains 



SAMPLING AND SAMPLING ERRORS 


247 


both of the individuals, or all of the items in the universe. 
Should the universe contain thousands of individuals but only 
two skin colors, each equally distributed among the population 
and subject to no variation in shade from one individual to 
another, a perfectly representative sample need still contain 
only one individual of a color, each drawn at random from a 
color group or stratum. If the same universe is not stratified, 
however, but the sample has to be drawn at random from the two 
races mixed together, a sample of more than two individuals 
should be taken, since otherwise the chance of getting all indi¬ 
viduals of the same color is one in four [(^ + = (i + i) + il- 

In fact, a fairly large sample—say not less than 25 items—is 
now advisable, to lessen the risk that one of the colors will 
appear much more often than the other, and so give a false 
impression of its relative frequency in the universe. Finally, 
suppose that the universe includes many individuals of the same 
race—say Negro—but the skin color varies widely among the 
individuals. Suppose, further, that we want to learn from a 
sample the relative frequency of occurrence of the various shades 
of skin color, including the extreme shades that the color take^. 
If some shade, say intensely black, exists in only one individual 
per 1,000 in the universe, a random sample containing even 
as many as 100 individuals will fail to include it nine times in 
10 [(1.000 - 0.001)100]. 

If it is wanted to use the sample merely to estimate the mean 
of the universe distribution, omissions at one part of the scale 
may cancel omissions at another part, so that the size of the 
sample need not be so great. Yet, for a given degree of accuracy 
in the estimate, the size of the sample must be increased as the 
variance, <r*, of the universe distribution, also estimated from the 
sample, increases. 

It is theoretically a simple matter to reduce the standard 
error to any desired value by merely increasing the size of the 
sample, N. For this purpose, we have the formula 

N2 - aWi, (126) 

where 

o = -'• (127) 

ei is the value of the original standard error, ei is the value of the 



248 


ELEMENTARY SOCIAL STATISTICS 


desired standard error^ Ni is the size of the original sample^ and 
N% is the size of sample needed to reduce €i to 62 . 

In Sec, 5c, above, we found the mean, 36.4, and its standard 
error, 1.42, from a simple sample of 100 items. What size of 
sample is required to reduce the standard error to one-half its 
present value? According to this requirement, €2 = €i/2, so 
that a = 2. Substituting in formula (126), 

N 2 = (2)2(100) = 400. 

Notice that when the divisor of the original standard error is 
named, we have only to multiply the size of the sample by the 
square of that divisor. That is, to divide the standard error by 
two, we multiply the size of the sample by four. This rule 
applies to any common standard error except the standard 
error of a frequency. The easiest way to deal with the standard 
error of a frequency is to substitute for it the standard error 
of the equivalent proportion, for which the above rule holds. 
For example, in Sec. 56, above, the frequency, 28, in Table 67 
was changed to the proportion, 0.28, for which the standard 
error was estimated to be 0.045. To reduce this error, or the 
relative error of the corresponding frequency, to one-third of 
its value, we multiply Ni = 100 by ( 3 )^ « 9, giving 900 as the 
size of the sample required. 

The problem of determining the proper size of fairly large 
samples may be approached in terms of confidence or fiducial 
limits. That is, we may require the sample to be of such a size 
that about 2P times in 100 the value of a parameter in which we 
are interested will be enclosed within a specified range. Using 
again the example of Sec. 5c, let us say that it is wanted to take a 
sample of such size that the chances are about 95 in 100 that the 
parameter will be enclosed within a range that extends on either 
side of S a distance equal to 10 per cent of the value of 8, We 
then require 



where, for this particular problem, 8 — ^ 36.4, 

== 14.2/ 

a; is a mean deviate of 8, p' is one-half the width of the range 

expressed as a percentage of the value of 8, - is the value read 

<r 

from a table of normal areas corresponding to one-half nf fh^ omo. 



SAMPLING AND SAMPLING ERRORS 


249 


enclosed by the specified confidence limits, P = .95/2 = 0.476, 
and N 2 is the required size of the sample. Assuming that the dis¬ 
tribution of sample means from large samples is approximately 
normal in form, we turn to Appendix Table 1, and find that 
2P = 2(.475) = 0.95 between the points x/c = ±1.96, so that 

x/<r = 1.96. Substituting in formula (128), —^ • - = p'^, we 

V A 2 ^ 

have 

(1.96) = .10(36.4), 

VF* = 7.64, 

Ni = 58.4. 

To check this, we write 

s± •■(;)' (129) 

36.4 ± 3.64. 

Since 3.64 is 10 per cent of 36.4, we have the result desired. 

Notice, as a further check, that in our solution the standard 
error of the mean is only 14.2/\/58.4 = 1.86. If the mean age 

Y 


-1.96c 0 +I.96C 

3?.76 36.40 40.04 

Fia. 63.—Showing 96 per cent confidence limits for the mean of a random 
sample of 68 items. About ninety-five chances in a hundred the true mean will 
be enclosed within the limits 32.76 and 40.04. 

varies by 10 per cent of its value, however, it will vary by ±3.64 
years, which is 3.64/1.86 = 1.96€ir. But ordinates of the normal 
curve at the points ± 1.96 standard errors include 95 per cent of 
the area of the curve. Therefore, the chances are about 95 in 100 
that the true mean will be found within ±10 per cent of the value 
of the mean of our sample. 




250 


ELEMENTARY SOCIAL STATISTICS 


7. Error in Mean vs. Individual Predictions from a Regression 
Equation.—Interest often centers in predicting averages rather 
than individual values from a regression equation.^ Thus, out 
of 20 counties with birth rates of 18 per 1,000, how far does the 
mean observed death rate differ from the most 'probable death 
rate predicted by the regression equation? The scatter of the 
observed means of such samples around the predicted value 
in this case depends upon the size of the sample, N, as well as 
upon the value of r, and may be found from the equation 


ep = 


Vn 


(130) 


It appears from this that the scatter in predicting the mean 
value of Y corresponding to a given value of X, or to the mid¬ 
point of an X class interval, is reduced, compared to the scatter 
in calculating any individual value of Y, in proportion to the 
square root of the size of the sample from which the mean is 
found. For the data of Table 50 of Chap. X, if we take a 
sample of 20 counties all with approximately the same birth 
rate, equation (130) gives 

. - 1-86 niiR 

€f = — = 0.416, 

which is less than a quarter of the size of the standard error of 
estimate, Sy (= 1.86), that governs the prediction of a death 
rate from a birth rate in the case of a single county. 

8 . Representativeness of a Sample.—The test of the goodness 
of a sample is simply the test of its representativeness. If we 
knew the value of the parameter, we could measure the repre¬ 
sentativeness of the sample in terms of the percentage deviation 
of the statistic from the parameter. Thus, if a is the statistic 
and S is the corresponding parameter, the formula for measuring 
the representativeness, Rp, is 

Bp *= j^lOO — 100 — ^ ^ j per cent, (131) 

where we take 8 — sii S > s. 

The value of the parameter is seldom known, however, for if 
it were, it is not likely that a sample would be taken. This 

1 See Chap. X, Tables 60 and 61. 



SAMPLING AND SAMPLING ERRORS 251 

means that there is generally no direct way of measuring the 
representativeness of a sample. The nearest approximation 
would be to take several additional samples, each equivalent in 
method of drawing and (preferably) in size to the original sample. 
Then, in addition to noting the variation of certain statistics 
from one of these samples to another, we might pool the samples 
to obtain an average statistic, average the statistics found from 
them, and substitute this average value for S in formula (131), 
above. But if this were done, we would of course at once 
abandon the statistic from the original sample in favor of the 
average statistic from the several samples, whose representative¬ 
ness would still be unknown. As a rule, therefore, the best that 
we can do in the way of formulating an index of representative¬ 
ness is to rely on a large sample, and, where possible, stratification 
of the universe, and measure the probable maximum deviation 
of the statistic from the parameter in terms of, say, two standard 
errors of the statistic (€,). This permits us to say that the 

r>iU 

probable minimum representativeness, of the statistic is 
Rp = ^100 - ~ 

In Sec. 5c, above, we found the mean age of a simple sample of 
100 ages to be 36.4 years, and the standard error of this mean to 
be 1.42 years. If we knew that the mean age in the universe 
from which the sample was taken was 37.5 years, we would find 
the representativeness of the sample by formula (131) to be 

Rp = [lOO - 100 

= 100 - 2.9 
= 97.1 per cent. 

But if we did not know the parameter value, we would estimate 
the probable minimum representativeness by formula (132), 

© = [l00 - 20o(^)] 

‘ = 100 - 7.8 

— 92.2 per cent. 

An indirect but important method of jud^ng the representa¬ 
tiveness of a sample makes use of the circumstance that although 




252 


ELEMENTARY SOCIAL STATISTICS 


the value of the attribute or variable for which we are sampling 
is not known in the universe, other universe values may be 
known; and if the sample can be shown to be representative of the 
latter, it is likely to be representative of the former also. As an 
illustration of this, we may draw a random sample of families in 
an Alabama county for the purpose of determining by field visits 
the percentage whose annual income falls below a certain mini¬ 
mum level. After the sample is obtained, it may be compared 
with the figures of the latest Federal census for the given county in 
regard to median size of family, the proportions of families having 
different numbers of children under 10 years of age, the per¬ 
centage of families that do not own their home, and the median 
rental paid. If a reasonably close agreement is found between 
the sample and the census population in these respects, the 
sample may usually be regarded as satisfactory also for the study 
of incomes. 


Exercises 

1. Define in both time and space (1) an infinite universe, (2) a limited 
universe, (3) a hypothetical universe, choosing in each case a universe 
of interest to social scientists. 

2 . Give an example of a universe of social attributes, and define the 
event and the ‘‘success.’’ 

3. Illustrate a universe of a social variable. 

4. Draw a sample of events or values from an actual known social 
universe, so that the sample will be (1) random, (2) simple, (3) Poisson 
(stratified). 

5. Draw a random sample of districts from a known social universe 
of your own choosing. 

6. In Table 34 of Chap. VIII, what is the standard error of the 
frequency in the class X ~ 0? What does it mean? 

7. In Table 34 of Chap. VIII, what is the standard error of the pro¬ 
portion of prisoners with no previous arrests? How does this standard 
error compare with that for a frequency found in Exercise 6 above? 

8. Ten thousand marriage certificates issued in the same month in 
five large American cities are taken as the universe, and a random 
sample of 500 certificates is drawn from them. After one year, it is 
found that 78 of the 500 marriages are divorced. What is the mean 
probability of divorce in this heterogeneous universe of marriages, 
and what is its approximate standard error? 

9* Judging from the sample in the following table, what is a range 
within which the true number of Orientals immigrating to the United 



SAMPLING AND SAMPLING ERRORS 263 


Sahflb of 740 Chinbsb and Japanesb Immiobants to thb United States, 
BY Ybar of Arrival 


Year 

Chinese 

Japanese 

Total 

1929 

102 

65 

167 

1928 

115 

49 

164 

1927 

105 

43 

148 

1925 and 1926 

187 

74 

261 

Total. 

509 

231 

740 


States in the year 1929, expressed as a percentage of the total Oriental 
immigration over the five-year period, 1925-1929, will fall 95 times out 
of 100? Compare the standard errors found on the assumptions of a 
simple sample from an infinite universe, a random sample from a 
limited universe (total Chinese immigrants, 6,090, total Japanese 
immigrants, 2,314), and a Poisson sample from a limited universe. If 
the sampling was random and proportional between Chinese and 
Japanese, which of these assumptions seems preferable, and why? • 

10. What is the standard error of the mean in the table of Exercise 3 
of Chap. XIII for urban families? How do you interpret it? 

11. Below are given the number of children under 5 years of age and 
the number of women aged 15-45 years for each of 20 random counties 
in Wisconsin in 1930, with the resulting fertility ratios. 

Within what range will the fertility ratio for the state of Wisconsin 
fall, 95 times out of 100? (Note: the fertility ratio in the State of 


Fertility Ratios and Basic Data, 20 Random Counties in Wisconsin, 

1930* 


County 

code 

Children under 

5 « Xi 

Women 15-45 « 
Yi 

Yi 

1 

731 

1,523 

\ .48 

2 

1,968 

4,331 

.45 

3 

3,463 

7,084 

.49 

4 

1,243 

2,723 

.46 

5 

6,998 

16,408 

.43 

6 

1,562 

3,157 

.49 

7 

953 

1,924 

.50 

8 

1,619 

3,526 

.46 

9 

3,707 

8,118 

.46 

10 

3,330 

6,765 

.49 

11 

2,536 

6,166 

.41 

12 . 

1,745 

3,339 

.52 

13 

10,016 

27,401 

.37 

14 

4,504 

10,889 

.41 

15 

1,757 

3.671 

.48 


*From Fifteenth Censue of the United Stetest 1930, Population, Vol. III. Part 2, pp. 
1814-1819. 



















254 


ELEMENTARY SOCIAL STATISTICS 


Fbrtiijtt Ratios and Basic Data, 20 Random Counties in Wisconsin, 
1930. *— {Continued) 


County 

code 

Children under 

6 - X< 

Women 15-45 = 

Yi 


16 

3,707 

10,546 

.35 

17 

2,796 


.50 

18 

3,758 

9,692 

.39 

19 

395 

648 

.61 

20 

5,364 

13,330 

.40 

Total. 

62,152 

146,791 

.42 


* From Fifteenth Census of the United States, 1930, Population, Vol. Ill, Part 2, pp. 
1314-1319. 


Wisconsin as a whole in 1930 was about 0.41. There are 71 counties 
in the state.) 

12. Within what range will the standard deviation of the fertility 
ratios in the universe fall 95 times in 100, according to the random 
sample in the table of Exercise 11 above? Can the standard error of 
the standard deviation be applied to urban families in the table of 
Exercise 3 of Chap. VIII? Explain. 

13. What size sample of rural nonfarm families in the table of Exer¬ 
cise 3 of Chap. XIII is needed to reduce the standard error to one- 
half its value? 

14. In the table of Exercise 9 above, what size sample of Japanese is 
required to confine the true value of the proportion of immigrants in 
the year 1929 within 5 per cent of the observed value 99 times in 100 
(i.6., within 99 per cent confidence limits)? 

15. Measure the probable minimum representativeness of the mean 
score in Table 69, above. 


References 

Baten, W. D.: Mathematical Statistical Chaps. XI and XV, John Wiley & 
Sons, Inc., New York, 1938. 

Cboxton, F. E., and D. J. Cowden: Applied General StatisticSf Chaps. XII 
and XIII, Prentice-Hall, Inc., New York, 1939. 

Mills, F. C.: Statistical Method^ rev. ed.. Chaps. XIV and XVIII, Henry 
Holt and Company, Inc., New York, 1938. 

Peters, C. C., and W. R. Van Voorhis: Statistical Procedures and Their 
Mathematical Bases, Chaps. V, VI, and XIV, McGraw-Hill Book 
Company, Inc., New York, 1940. 

Tippett, L. H. C.: The Methods of Statistics, 2d ed.. Chaps. II, III, IV, and 
VIII, Williams and Norgate, Ltd., London, 1937. 

Trbloar, a. E.: Elements of Statistical Reasoning, Chaps. X, XI, XIV, and 
XV, John Wiley A Sons, Inc., New York, 1939. 

Yule, G. U., and M. G. Kendall: An Introduction to the Theory of Statistics, 
Chaps. XVIIl-XXII, Charles Griffin A Company, Ltd., London, 1937. 










CHAPTER XIII 

THE SIGNIFICANCE OF DIFFERENCES 

1. The Meaning of Tests of Significance.—It has been seen 
that the value of a statistic estimated from a random sample usu¬ 
ally differs somewhat from the true value, or parameter, in the 
universe from which the sample is drawn. Similarly, the values 
of a statistic, such as the mean, yielded by two or more random 
samples from the same universe, will almost never be exactly 
the same, and may sometimes be quite far apart. Such varia¬ 
tions, however, are due merely to chance errors of sampling 
and imply no actual differences. On the other hand, samples 
from different universes yield statistics of different values which 
represent real differences in the parameters. It therefore becomes 
a matter of great importance in investigations based on sampling 
to distinguish between real differences and accidental ones. 

If we could be certain that two or more samples were taken 
at random from the same universe or from different universes, 
there would, of course, be no problem. In most of the practical 
sampling work done in the social sciences, however, the investi¬ 
gator cannot feel entirely confident that his samples are random, 
and he knows so little about the universes from which they are 
taken that he cannot say whether these universes are essentially 
the same or different. For example, if we try to take random 
samples of 500 persons each from the total population of a city 
like Chicago, it will not be easy to insure that the selection will 
be random, or even to guarantee that the persons drawn will all 
belong to the population of Chicago. If the several samples are 
not taken on the same day or even at the same hour, the popula¬ 
tions sampled may be radically different, because of the tra£5ic in 
and out of^the city in the mornings and evenings, on week ends 
and holidays. As a consequence of such uncertainties, an 
investigator feels the need for some kind of test that will lend 
additional security to any inferences that he may draw from 
samples. The development of such tests, based on the mathe- 

255 



256 


ELEMENTARY SOCIAL STATISTICS 


matical theory of probability, constitutes the major part of 
present-day statistical method. 

In Chap. XII, it was usually assumed that if the sampling was 
random or simple from a normally distributed universe, the 
statistic itself would be normally distributed over many samples. 
By the use of the standard error, it then became possible to 
estimate from the normal curve the probability that the param¬ 
eter would be enclosed within specified limits. It is now neces¬ 
sary only to extend these ideas to the differences between sta¬ 
tistics, and to direct attention to the common rule that if a 
difference as large as the one observed might occur by chance 
no oftener than five times in 100, it is regarded as a real 
difference. In that case, the difference is said to be significant. 

The differences that are tested by this method are of two general 
kinds. The first is the difference between the value of a statistic 
and the value of a known or hypothetical parameter. For 
example, can a group of mothers with a mean age of 27 years be a 
random sample from a universe of mothers whose mean age is 
24 years? Or, can a correlation coefficient, r = .34, be a random 
statistic from a universe in which there is no correlation, i.e., 
where r = 0? The second kind of difference that is frequently 
dealt with is the difference between the statistics from two or 
more samples. Thus, can two groups of mothers, one with a 
mean age of 27 years and the other with a mean age of 31 years, 
be random samples from the same universe? If the test shows 
that the difference is significant, it is inferred that the answer 
to the above questions is negative, on the grounds that a negative 
answer is highly probable. If the test fails to show a significant 
difference, the sample is regarded as a random sample from a 
given universe, or two samples are regarded as random samples 
from the same universe, until tests applied to larger samples show 
the contrary. If it is not positively known that the samples are 
random, a nonsignificant test at least allows us to say that the 
observed differences are no greater than might occur with random 
samples. 

If a difference is defined as real when the probability of its 
occurring by chance is as low as five in 100, we are said to be 
using the 6 per cent level of significance. The fixing of this critical 
probability is arbitrary and a matter of convention. The 5 per 
cent level is rather widely used at present, but the 1 per cent 



THE SIGNIFICANCE OF DIFFERENCES 


257 


level is preferred when there is need to be more conservative. 
Reference to Appendix Table 1 will show that 6 per cent of the 
area of the normal curve lies beyond ordinates erected at about 
two standard errors on each side of the mean, while 1 per cent 
of the area falls beyond ordinates at plus and minus 2.58 standard 
errors (see Fig. 54). It was formerly the practice to insist that 
an observed difference must fall as far out as three standard 
errors. At that point the probability is only about 27 in 10,000 
that so large a difference might occur by chance in either direc¬ 
tion. This is too stringent for ordinary purposes, because it 
causes the investigator to withhold judgment in an unnecessarily 
large proportion of cases. 


y 



Fio. 64.—Five per cent of the area of the normal curve taken at the positive end 
only, and divided equally between the positive and negative ends. 

Notice that since the 5 per cent level of significance, for 
example, includes 2.5 per cent of the area of the normal curve 
at each end of the X scale, it implies that the probability of 
getting either a positive or a negative difference is sought. If 
it is desired to find the probability of getting say a positive 
difference only, the reading is limited to the positive end of the 
scale (see Fig. 54). 

2. The Significance of a Correlation Coefficient.—Suppose we 
have a correlation coefficient r = .34 from a simple sample of 
30 pairs of variates from normal universes, one variate being 
scores on an I.Q. test and the other the scores of the same indi¬ 
viduals on a personality test. Is the value of the observed r here 
so small that it might occur as a random error in a sample from a 
universe in which there is no correlation? 

To answer this question, we test the difference of the observed 
value of r from zero. Appendix Table 4 has been designed to 
provide a ready-made test of this sort in the case of the correla- 


258 


ELEMENTARY SOCIAL STATISTICS 


tion coefficient. From it the value of the coefficient that is 
just significantly different from zero at the 5 per cent (or the 
1 per cent) level of significance may be read off at once, and 
compared with the observed value. The table is entered with 
1^ — 2 degrees of freedom, which in this case are 30 — 2 = 28. 
At the 5 per cent level we find that an r = .36 is just significant. 
Since the value of our r (± .34) is slightly smaller than this, it 
might occur by chance a little oftener than five times in 100. 
If we are governed strictly by the 5 per cent criterion, there¬ 
fore, we cannot accept an r = .34 as significantly different 
from zero. 

For simple samples from a normal universe so large that N is 
not covered by Appendix Table 4, formula (133) is convenient 
to test the hypothesis that the observed value of r is not different 
from zero. 


Vn’ 

In the example above, 


«r 


1 

VSo 


0.18. 


(133) 


The ratio of the statistic r to its standard error, called the 
critical ratio iC.R.), is 


C.R. 


M 

.18 


1.89. 


Entering a table of normal areas (Appendix Table 1) with 
C.R. = x/o = 1.89, the probability is found to be about six 
in 100 that a larger value of r than that observed might occur 
because of random errors of sampling. Again we find that the 
value r = .34 is not quite significantly different from zero. It 
might have come from a universe in which there was no correla¬ 
tion at all. 

8. The Significance of gi and gt. —In a problem in Chap. IX 
we found the measure of skewness of a certain distribution 
to be gi » 1.17. The standard error of gi in large samples is 
approximately 




S. 

N 


( 134 ) 



THE SIGNIFICANCE OF DIFFERENCES 


259 


Substituting in formula (134), 

0.164. 


If now we divide the value of by its standard error, we get the 
critical ratio 1.17/0.154 = 7.6. Since this is much more than 
two standard errors, chance is ruled out, and we conclude that the 
distribution could not reasonably be regarded as a random sample 
from a normal universe. 

Let us next test the value of the measure of kurtosis, g 2 = 0.86, 
found for the same distribution in Chap. IX. The standard 
error of for large samples is 


For N = 252, 



(135) 


The critic'lal ratio is therefore 0.86/0.309 = 2.78. Thus the 
peaked distribution in question could not have been drawn at 
random from a normally distributed universe.^ 

4. The Significance of the Difference between Any Two 
Statistics.—The variance of the differences, D, between n paired 
values of two variables, Xi and X 2 , is, by the usual formula, 

, _ S(D - Mr>y 

- --, 

where Md is the mean of the differences. Or, since 
D = - X 2 , and 



N 

2[(Zi - Mxi) - (Xt - 
N 


^ For a more exact interpretation of the critical ratio see L. H. C. 
Tippett, The Methods of StaJtisticSy 2nd. ed., page 86. 



260 ELEMENTARY SOCIAL STATISTICS 




Letting — Mxi — *i, and — Mxt * ®», 

2(*i - a:,)* 

N 

2(xi* — 2a:iXj + ®**) 

N 

2*1® 22*1*8 . 2*j* 

N N N 

2*1*1 


= «ri* + <r,®-2<ri«rij^- 

By formula (81), J^xiXa/Ncria^ = ri 2 , so that 

(Td* = + O'!* — 2ri2<ri<r2. 

If now we let cr = €, we have 

€d^ = + € 2 ^ — 2 ri 2 €l€ 2 , 


(136) 


where ci is the standard error of the statistic in the first sample, 
€2 is the standard error of the corresponding statistic in the 
second sample, and ri 2 is the correlation coeiEcient between a 
number of sample values of the two statistics. 

Usually, correlation between two sample statistics is purposely 
introduced by drawing one sample at random, and then matching 
on some principle each of the items or values so drawn with an 
item or value from another population. For example, the I.Q.’s 
of a random sample of criminals may be matched or paired with 
the I.Q.’s of their brothers. 

If the statistics are the means or proportions from two samples 
whose individual values are matched in some way, the simple 
correlation coefficient, ri 2 , in the case of means, or u (tetrachoric 
correlation coefficient) in the case of proportions, may be used 
to determine the amount of correlation between the paired 
items of the two samples. Where correlation is believed to 
exist between two samples, but it is not known what items are 
paired or on what principle the correlation depends, it is often 
difficult to find the value of ri 2 . 

When there is no correlation between the two samples, t.e., 
when both of the samples are random or simple and so are 
independent, rw = 0, and formula (136) reduces to 


€X>* *= 


cx* + €a*. 


(137) 



THE SIGNIFICANCE OF DIFFERENCES 261 

Formulas (136) and (137) make no assumptions in addition 
to those involved in finding €i and € 2 . 

6. The Difference between Two Means.—A simpler formula 
than (136) for testing the difference between the means of two 
matched samples is 


where aa is the standard deviation of the differences between 
the paired values, and N is the number of pairs. The ca is 
estimated from the usual formula for the standard deviation. 
Formula (138) assumes that the experimental sample (i.c., the 
random sample that receives the ‘treatment *0^ is a simple 
sample. 

The scores of a simple sample of brothers and their sisters on a 
personality test are shown in Table 71. Are the means of the 
two series significantly different? Correlation is evidently pres¬ 
ent between brother and sister, so it is necessary to calculate the 
correlation between them or else to use formula (138). We shall 
do both, for comparison. For Table 71 we have, by formula (74), 

ATSXF - sxsr 

’’ y/[N'zx^ - (2X)*][(Jvsr* - (sr)*]’ 

_ 40(1215) - 212(203) _ 

V[40(l,320) - (212)*][40(1,231) - (203)*]’ 
r = .70. 


/(fTW - 


<ri = 1.73. 

Now, the standard error squared of the mean of the X's is, by 
formula (118) of Chap. XII, 


2 

6jf. - 


_1i320 _/2^Y_ 

40 \ 40 / “ 


* — — 0 12 
* 40 


* See Chapa. Ill and IV. 



262 


ELEMENTARY SOCIAL STATISTICS 


Table 71. —Personality Test Scores op 40 Pairs op Brothers and 
Sisters. (Hypothetical Data) 


Brother 

(X) 

Sister 

(10 

xy 

a 

y« 


d* 

8 

5 

40 

64 



9- 

3 

4 

12 

9 


-1 

1 

8 


56 

64 



1 

2 


4 

4 


0 

0 

2 


6 

4 

9 


1 

4 


28 

16 

49 


9 

6 


30 

36 

25 


1 

5 


15 

25 

9 

2 

4 

7 

9 

63 

49 

81 

-2 

4 

10 

9 

90 

100 

81 

1 

1 

10 

8 

80 

100 

64 

2 

4 

1 

3 

3 

1 

9 

-2 

4 

9 

7 

63 

81 

49 

2 

4 

8 

6 

48 

64 

36 

2 

4 

7 

6 

35 

49 

25 

2 

4 

7 

8 

56 

49 

64 

-1 

1 

5 

4 

20 

25 

16 

1 

1 

6 

6 

30 

25 

36 

-1 

1 

5 

5 

25 

25 

25 

0 

0 

4 

6 

24 

16 

36 

-2 

4 

4 

4 

16 

16 

16 

0 

0 

4 

3 

12 

16 

9 

1 

1 

4 

1 

4 

16 

1 

3 

9 

3 

3 

9 

9 

9 

0 

0 

3 

2 

6 

9 

4 

1 

1 

3 

5 

15 

9 

25 

-2 

4 

3 

4 

12 

9 

16 

-1 

1 

5 


35 

25 

1 49 

-2 

4 

8 


80 

64 


-2 

4 

6 


30 

36 

■■ 

1 

1 

6 


24 

36 


2 

4 

7 


49 

49 


0 

0 

7 


28 

49 


3 

9 

5 


40 

26 

64 

-3 

9 

4 

1 

4 

16 

1 

3 

i 

9 

2 

2 

4 

4 

4 


0 

6 

6 

30 

36 

25 


1 

4 

3 

12 

16 

9 


1 

7 

6 

42 

49 

36 


1 

5 

7 

35 

25 

49 


4 

212 

203 

1,215 

1,320 

1,231 

9 

121 




































THE SIGNIFICANCE OF DIFFERENCES 


263 


Similarly, for the Y’b, 


« 

(Ty^ 




1,231 


“ 40 


5.02 


40 



6 . 02 , 


Substituting in formula (136), 


Now, 


= 0.12 - 2(0.7)(0.35)(0.36) + 0.13, 
= 0.07, or (D = 0.27. 


M. 

Mr 

C.R* 


N 

sr 


212 

40 


= 5.30, 


- 2^-508 
N ^ — ^ 5.08, 


5.30 - 5.08 
0.27 


= 0.81 


The critical ratio is much less than two, so the difference between 
the two means is not significant. 

Substituting next in formula (138), 


= ^73 
\/40 


0.27, 


which quickly gives the same value obtained by the longer 
method. 

The meaning of this result is that there is no more difference 
between the scores of brothers and sisters on a personality test 
than might be attributed to random errors of sampling. 

Suppose we had neglected the correlation between the data 
of Table 71, and used formula (137) to test the significance of the 
difference between the two means. How much would the result 
have been changed? We have, from formula (137), 


and 


€i>^ = 0.12 + 0.13 = 0.25 


C.R. 


5.30 ~ 5.08 


0.44. 


In testing differences, the critical ratio (C.R.) is the ratio of the differ¬ 
ence to the standard error of the difference. 



264 


ELEMENTARY SOCIAL STATISTICS 


The correction for dependence thus almost doubled the critical 
ratio, although in this instance it did not change the verdict 
regarding the significance of the difference. 

When the hypothesis to be tested is that two simple samples 
were drawn from the same universe, the best estimate of any 
parameter is found by pooling the two samples. For example, 
if we are testing the difference between the means of two samples, 
formula (137) becomes 



where Ni is the number of cases in the first sample, N 2 is the 
number of cases in the second sample, and is found from the 
two samples combined by the equation 

Nt Nt 

Ni + N» 

where Xi is any value of the variate in the first sample, Mxt is 
the mean of the first sample, X 2 is any value of the variate in the 
second sample, and ilfx, is the mean of the second sample. 

Table 72, below, gives the scores of 75 communities on a 
community organization test, the sample of communities being 


Table 72.— Scores of 75 Communities on a Community Organization 

Test 


Score (X) 

Communities 

(/) 

d 

fd 

fd* 

80-99 


2 

14 

28 

60-79 


1 

15 

15 

40-59 



0 

0 

20-39 

13 

-1 

-13 

13 

0-19 

11 

-2 

-22 

44 

Total. 

75 


- 6 

100 


simple and independent of the sample of 100 communities in 
Table 69 of Chap. XII. The mean score of Table 72 is 48.4, 
and its standard deviation is 2S. Let us test the hypothesis 


^ This formula gives simply a weighted mean of the two variances, ci* and 
cs^ and should not be confused with formula (29) of Chap. VIII, which gives 
the variance of combined distributions. 















THE SIGNIFICANCE OF DIFFERENCES 


266 


that these two tables represent simple samples from the same 
universe. The samples being independent by definition, we 
require formula (139). The variance of the two samples 
combined is found by formula (140), expressed in frequency 
form: 


(Sm)’ 


(Sm)‘i 


(20)- [ 117 - + 100 - mi‘] 


(140a) 


100 + 75 


ff* = 493.78. 


Substituting this value in formula (139), 

4 = 493.78(Tk + A) = 11-52, 
€i, = 3.394. 


We therefore have 

_ (M,. - K,.) _ ^,^48.4 _ „ 
CD 0.39 


Evidently, the data of the two tables might well represent 
simple samples from the same universe, as far as their mean 
values are concerned. 

If it is believed that two simple samples are from different 
universes, and it is wanted to test whether the difference between 
their means falls within the range of chance error so that it 
might sometimes be obliterated by sampling error, formula (137) 
takes the form 


2 _L. 




Ni ^ Nt 


(141) 


Applying this formula to Tables 69 and 72, we get 


, 466.56 , 530.77 

«i) = 3.43, 


11.74, 


winch is slightly lai^er than the standard error obtained on the 
assumption that the samples are from the same universe. Since 



266 ELEMENTARY SOCIAL STATISTICS 


the critical ratio is only 


C.B. = = .0583/ 


we interpret it to mean that if there is a real difference between 
the universes from which the two samples came, it may easily be 
reduced to zero or nearly zero in random samples. 

Sometimes two simple samples are taken from the same 
universe, and the mean of sample 1 is tested against the mean 
of the two samples combined. Correlation is thereby introduced, 
and the appropriate formula is then 


' Ni(Ni + N2y 


(142) 


where is found from the two samples combined, using formula 
(140) or formula (140a). This formula leads to the same critical 
ratio as formula (139). To show this, let us test the mean score 
(48.4) of the 75 communities in Table 72 against the mean 
score (48.51) of Table 72 and Table 69 of Chap. XII combined, 
on the theory that the two samples together give a better picture 
of the universe of communities from which the samples were 
drawn than does either one alone. Substituting in formula (142) 
the values previously found. 


, _ 493.78(100) 
75(75 + 100)' 
= 3.76, 
to = 1.9396, 

^ _ (48.51 - 48.4) 

U.Xt. — - 


= 0.0589, 


which is identical with the result previously obtained. 

6. The Difference between Two Proportions.—^Although we 
have so far dealt only with the differences between means of 
samples, the same types of formula hold for other statistics. 
For example, if we are testing the difference between two pro¬ 
portions, the formulas corresponding to formulas (139), (140), 
(141), and (142) are, in order: 

Two simple samples from the same universe, 

«»* = M (” + 

^’‘\ni ntj 


( 148 ) 



THE SIGNIFICANCE OF DIFFERENCES 


267 


where! 


. _ nipi + UiPt 


^ m + n, ' ' 

Two simple samples from different universes, 

= Ml + m. (145) 

Til Tl% 

Two simple samples from the same universe, the proportion of 
successes in the first tested against the mean proportion of 
successes in the two combined, 

. (146) 

(ni + ni)ni 

Can samples 1 and 2 of Table 73, below, be simple samples 
from the same universe, in the proportion of families having 
seven or more members? Applying formula (143), we need the 
value of p from formula (144): 

_ (132) (^) + (134) (rlr) 

^ -132+’134- 


Whence 


q = 1.0000 - .0865 = .9135, 

ec* = (.0865) (.9135) (rk + ih:) = 001188, 
€d = 0.0345, 


Table 73.—^Two Samples op Families, Classified by Number of 

Members 


Members in family 

Sample 1 
(fi) 

Sample 2 

(f>) 

1- 2 

56 

66 

3- 4 

40 

42 

6^ 6 

22 

17 

7- 8 

6 

5 

^10 

4 

1 

11-12 

2 

2 

13-14 

0 

1 

15-16 

2 

0 

Total. 

132 

134 













268 


ELEMENTARY SOCIAL STATISTICS 


From Appendix Table 1, we find a probability of about 26 in 100 
that a difference greater than that observed between the pro¬ 
portions in the two samples might occur by sampling error, under 
the conditions assumed. If now we use the alternative formula 
(146), we get 

Cd 

C.R. 


(.0865)(.9136)(134) _ 

(132 + 134)132 ^ 000302, 

0.0174, 

(t^ "" 3^) — 1 io 

.0174 ~ 


Notice again that this is the same critical ratio as that obtained 
just above by the use of formula (143). 

7. The Difference between Two Correlation CoeflSicients.—To 
test the significance of the difference between two correlation 
coefficients from simple samples,^ the variates being normally 
distributed and independent, it is necessary to convert ri and r 2 
to Zi and 22 , respectively. This is readily done by means of 
Appendix Table 5. The standard error of z is then found from 
the formula 


1 


(147) 


Suppose for the correlation between the linguistic ability and 
leadership scores of a group of children, we find ri = .50 from 
sample 1, and r 2 = .60 from sample 2, where iVi = iV 2 = 50. 
Is the difference between the two r^s significant, or is it merely 
an accident of sampling? From Appendix Table 5, we find for 
rt = .50, Zi = .549, and for r 2 = .60, Z 2 = .693. By formula 
(147) we calculate the standard error of 2 , 




1 

V60 - 3 


= 0.146. 


Hence the standard error of the difference z» — Zi is, by formula 
(137), 

€d* = (.146)* + (.146)* = 0.0426. 

So that 


C.B. 


.693 - .549 

V:om 


0.699. 


‘Of any sise. 



THE SIGNIFICANCE OF DIFFERENCES 


269 


Since this critical ratio is well under two standard errors, we 
infer that the difference between the two r's is not significant. 

8. Testing the Significance of a Sum. —The basic formulas 
(136) and (137) are the same for the sum as for the difference 
between two statistics, except that for the sum all the signs 
of formula (136) are positive. 

9. Testing the Hypothesis of Simple Sampling. —Suppose we 
ask if Table 72 above can be a simple sample from a uni¬ 
verse of communities in^which the mean score is 40. We have 
cjif = <T*/y/N = 23/\/75 = 2.66, so that 

C.R. = (48.4 - 40.0)/2.66 = 3.16. 

Since the critical ratio is greater than two standard errors, it is 
not likely that the sample is a simple sample from a universe of 
communities whose mean score is 40. There are several possible 
explanations: (1) Table 72 may be a simple sample with an 
extreme mean that might rarely be drawn by chance from the 
given universe; (2) it may be a sample from the given universe, 
but not taken as a simple sample should have been; or (3) it 
may not be a sample from the given universe at all. There is no 
way to determine which of these possibilities is correct, unless 
it can be learned how the sample was actually taken. 

The purpose of testing the difference between the two 
means in Sec. 5, above, might have been to discover whether 
or not they could occur in two simple samples from the same 
universe. The very low critical ratio of 0.0589 suggests an 
affirmative answer, but it cannot completely establish the fact. 
For example, the low critical ratio might be due to the small 
size of the samples or to the presence of correlation, or it might 
be an accident not connected with random sampling. 

The same test can, of course, be employed to determine 
whether a sample might be random or Poisson, by merely using 
the standard error formula appropriate in each case. 

10. The Significance of the Difference between Two or More 
Frequency Distributions. —more complete test of the hypothe¬ 
sis that two or more samples are simple samples from the same 
universe is possible by the Chi-square method, which goes 
beyond the comparison of single statistics (e.g., means) to the 
comparison of whole distributions. In Table 73 we have two 

found from formula (118), Chap. XII, 



270 


ELEMENTARY SOCIAL STATISTICS 


samples of families distributed according to number of members. 
We can use either formula (148) or formula (149) to find Chi 
square (x*)- Formula (148) is applicable only to the case of 
two samples, when the total x* for each row is not wanted, but is 
quicker than formula (149). We shall apply it here. 


where 


X* = :sl2(r/i-rPi) - nip], 
Vh 


“2 


iU-ftY 


p = 


ni 

w 


'*’■ - 5- 

g = 1 - p, 
rnric 




N 


(148) 

(149) 


,/i is the frequency in row r of col. 1, ric is the total of any column, 
c, ni is the total of col. 1, rW is the total of any row r, fo is any 
observed frequency, ft is any theoretical or expected frequency, 
and N is the total of the whole table. For Table 73 we have 

p = = 0.496, 

g = 1 - 0.496 = 0.504, 
iPi = == 0.459, 

2 P 1 = == 0.488, 

sPi == — 0.564, 

4 P 1 — ^ = 0.609. 

To get 4 P 1 , the frequencies of the last five rows were combined, in 
accordance with the rule that no cell should contain less than 
five expected frequencies. Substituting in formula (148), 

;<* « 140 0 ^ ^ [56(.459) + 40(.488) + 22(.564) + 14(.609) 

- 132(.496)], 

X* “ 2.76. 

Entering a table of Chi square (Appendix Table 2) with 


r — 1**4—1=3 degrees of freedom 



THE SIGNIFICANCE OF DIFFERENCES 


271 


(i.e., one less than the number of rows, counting the five com¬ 
bined rows as one), we find a probability P between .30 and .60 
that the differences between the two samples might be due to 
errors of simple sampling from the same universe. The test 
therefore fails to show that the two sample distributions of 
families differ significantly in number of members. 

The Chi-square test may also be used to investigate whether 
a sample is a simple sample from a known universe, if the distri¬ 
bution of the universe is known. The universe distribution, 
with N equated to that of the sample, simply takes the place 
of one of the samples in Table 73. 

To test whether more than two samples are from the same 
universe, it is necessary to find Chi square by formula (149). 
Its application to three samples in Table 74 is shown below. 


Table 74. —^Three Samples op Families, Classified by Number of 

Members 


Members in 
family 

Sample 1 

Sample 2 

Sample 3 

Total 

A 

ft 

WtKa 

/• 

ft 

(f.-ft)* 

/. 

ft 

(/. - /.)« 

igi 

ft 

ft 

1- 2 

56 

56.34 

.00205 

66 

57.20 

1.35385 

53 

61.46 

1.16452 

175 

3- 4 

40 

40.57 

.00801 

42 

41.18 

.01633 

44 

44.25 

.00141 

126 

3- 6 

22 

21.57 

.00857 

17 

21.90 

1.09635 

28 

23.53 

.84917 

67 

7- 8 


13.52 

.01704 

5 

13.73 

1.62949 

9 

14.75 

1.22457 

20\ 

9-10 

u 



1 



3 



81 

11-12 




2 



5 



9> 

13-14 

Jo 

. . i 


1 



1 



2i 

15-16 

V2 



0 



1 



3/ 











Total. 

I32 

132.01 

0.03567 

134 

134.00 

4.09602 

I44 

143.99 

3.23967 

410 


The expected frequency, /<, in any cell is found, as explained in 
Chap. IX, by dividing the table total into the product of the 
row and column totals. For example, the expected frequency 
in the class interval 3-4, sample 2, is 126(134)/410 = 41.18; 
in the class interval 5-6, sample 3, it is 67(144)/410 = 23.53; 
and so on. The last five rows are combined, because four of 
them have fewer than five expected frequencies. After com¬ 
bining, the expected frequencies (132)(42)/410 — 13.52. By 
formula (149), 


X* = 7.37136. 



























272 ELEMENTARY SOCIAL STATISTICS 

The degrees of freedom are 

(c - l)(r - 1) = (3 - 1)(4 - 1) = 6, 

where c is the number of columns, and r is the number of rows, 
counting all combined rows as one. Entering a table of x* with 
six degrees of freedom, we see that x* = 12.59 for P = .05. 
That is, a value of x* as large as 12.59 may be expected by chance 
five times in 100. Smaller values of x* will, of course, occur 
more often by chance. Since our value of x^ is only 7.37136, it 
cannot be regarded as significant. The test therefore furnishes 
no evidence that our three samples are not simple samples from 
the same universe. 

11. The Significance of the Difference between Statistics 
from More than Two Samples.—In testing the significance of 
the differences between statistics (e.gf., means) from three or more 
samples, the probability of finding a significant difference by 
chance is greater than in the case of only one difference, just as 
the probability of getting an ace at cards is greater when we draw 
twice from the deck than when we draw only once. 

A formula that takes this into account is the following: 

where d< is the difference between any two independent statistics, 
€< is its standard error, and n is the number of differences. In 


Tablb 75.— Six Samples of 50 Juvenile Delinquents Each, and Six 
Control Samples, Showing Percentages Neurotic 


Samples 

Percentage 

delinquents 

neurotic 

Percentage 

nondelin- 

qucnts 

neurotic 

dt 

1 

k 

« 

1 and la. 


6 

-2 


BH 

2 and 2a. 


4 

6 

5.1 

1.18 

8 and 3a. 


4 

-2 

3.4 

-0.59 

4 and 4a. 


2 

-2 

2.0 

-1.00 

5 and 5a. 


8 

-4 

4.7 

-0.85 

6 and 6a. 

6 

2 

4 

3.9 

1.03 

Total. 

1 

1 




-0.68 

1 























THE SIGNIFICANCE OF DIFFERENCES 


273 


Table 75 we have six simple samples of juvenile delinquents and 
six simple samples of nondelinquents, each containing 50 boys. 
Using formula (143) to find the standard error of the six differ¬ 
ences, we have, for the first pair of samples in the table, 

.. = i * 

The standard errors of the other differences are found similarly, 
and entered in Table 75. Substituting from the last column 
of the table in formula (150), 

C.R. = (-0.68) = -0.25. 

From a table of normal areas (Appendix Table 1), it appears 
that a positive or negative critical ratio greater than this might 
occur by chance over 80 times in 100 trials, so that there is no 
evidence that the samples of delinquents differ significantly 
from the samples of nondelinquents in respect to the percentages 
neurotic. 

Formula (150) is applicable to any set of independent critical 
ratios, including those from random or Poisson samples, if 
random or Poisson standard error formulas are used to find the 
values of 


Exercises 

1. Correlate the birth rates of Table 70 of Chap. XII with the fer¬ 
tility ratios of the same sample counties in the table of Exercise 11 in 
Chap. XII, and test whether the correlation coefficient is significantly 
greater than zero. 

2. In the table of Exercise 3, below, test the hypothesis that rural 
farm and rural nonfarm families (1) are from the same universe in 
respect to mean size of family; (2) are from different universes, but their 
means might sometimes be approximately equal as a result of sampling 
error. 

3. In the table below test the assumption that urban families might 
be a random sample from a universe which is best represented by the 
three samples combined. Use the mean as the criterion. 



274 


ELEMENTARY SOCIAL STATISTICS 


Sauplh of 2,992 Fauiuxb bt Sizk, for Urban and Rural Arrab, Unitbo 

Stateb, 1930 


Members in 
family 

Urban 

families 

Rural 

farm 

families 

Rural 

nonfarm 

families 

Total 

1. 


34 

62 

236 

2. 

436 

121 

141 

698 

3. 

384 

119 

120 

623 

4. 

315 

110 

99 

524 

6. 


88 

68 

358 

6. 

118 

66 

43 

227 

7. 

66 

47 

27 

140 

8. 

37 

32 

16 

85 

9. 

20 

20 

9 

49 

10. 

10 

12 

5 

27 

11. 

5 

6 

2 

13 

12 or more*. 

4 

6 

2 

12 

Sample total. 

1,737 

661 

594 

2,992 


Universe total.... 

17,400,000 

6,600,000 

5,900,000 

29,900,000 


* Count as 13. 


4* Do the matched delinquents and nondelinquents in the sample 
below differ significantly in mean I.Q.? 

Intblliosnce Quotients op a Random Sample op 25 Male Juvenile 
Delinquents and 25 Male Nondelinquents, Matched by Age, 
Family Income, and Place op Residence 


Pair number 

1 Intelligence quotient 

Delinquent 

Nondelinquent 


103 

99 


80 

92 


114 



100 



91 

88 


73 

80 


105 

109 


98 

94 


86 

90 


101 

97 


92 

89 


86 

91 


93 

90 

14 

90 

97 

16 

79 

84 



























THE SIGNIFICANCE OF DIFFERENCES 


275 


Intelligence Quotients op a Random Sample or 25 Male Juvenile 
Delinquents and 25 Male Nondelinquents, Matched by Age, 
Family Income, and Place of Residence.— (Continued) 


Pair number 

Intelligence quotient 

Delinquent 

Nondelinquent 

16 

108 

96 

17 

82 

91 

18 

95 

86 

19 

74 

83 

20 

102 

97 

21 

105 

99 

22 

97 

103 

23 

88 

91 

24 

94 

84 

25 

99 

106 

Total. 

2,335 

2,346 


5. Combining rural farm and rural nonfarm families in the table of 
Exercise 3, above, is there a significant difference between the mean 
size of family in urban and rural areas according to this sample taken 
proportionally at random from the two types of areas? 

6. Apply Chi-square to the table of Exercise 3 above to test (1) the 
hypothesis that rural farm and rural nonfarm families are simple samples 
from the same universe; (2) the hypothesis that urban, rural farm, and 
rural nonfarm families are simple samples from the same universe. 

7. In the table of Exercise 3 above, test the hypothesis that the means 
of the three simple samples are from the same universe. 

8. Test the hypothesis that families of odd sizes and families of even 
sizes in the table of Exercise 3 above are Poisson samples from the 
same stratified universe. 

9. Test the hypothesis that the value of a selected statistic from each 
of the samples drawn in Exercise 4 of Chap. XII does not differ sig¬ 
nificantly from the known value of the corresponding parameter in the 
universe. 

10. Test the hypothesis that the value of a selected statistic from the 
sample drawn in Exercise 5 of Chap. XII does not differ significantly 
from the known value of the corresponding parameter in the universe. 

References 

Same as for Chap. XII. 




CHAPTER XIV 
TIME SERIES ANALYSIS 

Values of a variable {e.g., infant death rates) given at successive 
intervals of time (e.flf., yearly) form a time series. Such series 
are especially important in economics and are also necessary in 
the study of vital statistics/ of trends in public expenditures for 
relief, and many other topics. This chapter describes methods 
for their analysis. 

As an illustrative problem, let us inquire what the state of 
Wisconsin has accomplished in reducing infant mortality. 
Figures giving deaths per 1,000 live births from 1908 through 
1935 are shown as a time series in Table 76. 


Table 76.— Wisconsin Infant Mortality Rates, 1908-1935* 


Year 

Infant deaths 
per 1,000 live 
births 

Year 

Infant deaths 
per 1,000 live 
births 

1908 

107 


70 

1909 

120 


70 

1910 

109 

1924 

64 

1911 

103 

1925 

67 

1912 

95 

1926 

67 

1913 

97 

1927 

60 

1914 

83 

1928 

61 

1915 

78 

1929 

60 

1916 

86 

1930 

56 

1917 

78 

1931 

53 

1918 

79 

1932 

50 

1919 

79 

1933 

48 

1920 

77 

1934 

49 

1921 

72 

1935 

46 


* Report oftho Wioconoin Bureau of Vital 3tatiotie »» 1934-1935, p. 284, 


The first step that is usually taken in time series analysis is to 
plot the data. This is done for Table 76 in the lower graph of 

^ Birth rate, death rates, marriage rates, and so on. 

276 









TIME SERIES ANALYSIS 


277 


Fig. 66. Examination of this figure shows a striking decline in 
infant mortality in Wisconsin over a 28-year period. 



Afo/e: Oofs ihdicafe moving averages 

Fia. 55.—Infant mortality rates for Wisconsin and for the United States, 1908- 
1935. (From Tables 76, 77. and 80.) 

1. The Secular Trend: A Straight Line.—Suppose, further, 
that we want to compare the infant mortality record in Wisconsin 
with that of other states in the United States. Data for the 
original birth registration area of 10 states and the District of 
Columbia^ are available for the period 1915 through 1933, and 
are entered in Table 77. They are plotted as a dotted line in 
Fig. 65. It is seen, from this figure, that infant mortality has 
been less in Wisconsin than in the original registration area 

^ Connecticut, Maine, Massachusetts, Michigan, New Hampshire, New 
York, Pennsylvania, Rhode Island, Vermont, and the District of Columbia. 























278 ELEMENTARY SOCIAL STATISTICS 

throughout the entire period of comparison. But has the raie 
of decline since 1915 been greater in Wisconsin or in the 
original registration area? 


Table 77.— Infant Mortalitt in the Original Birth Registration 
Area of the United States, 1915-1033 


Year 

Infant deaths 
per 1,000 live 
births 

Year 

Infant deaths 
per 1,000 live 
births 

1915 

100 


74 

1916 

100 


75 

1917 

96 


64 

1918 

106 


67 

1919 

89 


65 

1920 

90 


62 

1921 

79 


60 

1922 

79 


55 

1923 

79 


53 

1924 

72 

■IB 



* From Birth, Stillbirth, and JnfarU Mortality StatUtica, 1933, p. 7, U. S. Bureau of the 
Census. 


The answer is to determine which of the two series has the steeper 
slope. Inspection shows that both graphs are irregular and 
saw-toothed in shape, so that the slope sometimes of one and 
sometimes of the other is the steeper. What we must do is to 
remove the irregularities in the two series, t.e., reduce them to 
smooth curves. To do this is to find the secular trendy meaning 
the general direction of the series over a considerable period of 
time, freed from confusing oscillations. To answer our par¬ 
ticular question in the present case, it seems appropriate to fit 
straight lines to the data, since more complex smooth curves do 
not describe the declining death rates any better. 

We have already learned to fit a straight line by the device 
of least squares, in determining the regression equation in linear 
correlation. The normal equations for finding the values of a 
and h in the line of best fit are 












TIME SERIES ANALYSIS 


279 


where a; is a deviate from the midyear of an odd} number of years, 
Y is the infant death rate, and the origin is at the midyear or 
mean of the X^s. The values of a and 6 so found are substituted 
in the equation of the straight line. 

Fo = a + hx. (153) 

We set up Table 78 to obtain these values for the Wisconsin 
series, and Table 79 for the registration area series. 


Table 78.— Fitting a Straight Line to the Wisconsin Data op Table 76 


Year 

Year 

(®i) 

Infant death rate 
(Y) 

x,y 

Xl* 

1915 

-9 

78 

-702 

81 

1916 

-8 

86 

-688 

64 

1917 

~7 

78 

-546 

49 

1918 

~6 

79 

-474 

36 

1919 


79 

-395 

25 

1920 


77 

-308 

16 

1921 


72 

-216 

9 

1922 


70 

-140 

4 

1923 

-1 

70 

- 70 

1 

1924 

0 

64 

0 

0 

1925 

1 

67 

67 

1 

1926 

2 

67 

134 

4 

1927 

3 

60 

180 

9 

1928 

4 

61 

244 

16 

1929 

5 

60 

300 

25 

1930 

6 

56 

336 

36 

1931 

7 

53 

371 

49 

1932 

8 

50 

400 

64 

1933 

9 

48 

XY - 1,275 

My - 67.1 

432 

XxiY » -1,075 

81 

rxi* - 570 


From Table 78, 

6i 


-1,075 

570 


-1.880,^ 


* If the seizes includes an even number of years, one of them may be 
dropped to give an odd number, so that the convenient short formulas (151) 
and (152) may be used, instead of the more laborious normal equations for a 
straight line given in Chap. X. 













280 ELEMENTARY SOCIAL STATISTICS 

and 

fli = 67.1, 

so that, approximately, 

Fi = 67 - 1.89*1 (153o) 

is the equation of the straight-line secular trend fitted to infant 
mortality rates in Wisconsin with origin at 1924. 

Similarly, from Table 79, we find the equation of the straight- 
line trend through infant mortality rates in the original registra¬ 
tion area of the United States to be approximately 


Fa = 77 - 2.75*,. (1636) 


Tabus 79.— FimNO a Straight Linb to the Obiqinai. Registration 
Area Data of Tabie 77 


Year 

Year 

(*,) 

Infant death rate 
(F) 

x,7 


1915 

~9 

100 

-900 

(see Table 78) 

1916 

-8 

100 

-800 


1917 

-7 

96 

-672 


1918 


106 

-636 


1919 


89 

-445 


1920 

^^9 

90 

-360 


1921 

-3 

79 

-237 


1922 

-2 

79 

-158 


1923 

-1 

79 

- 79 


1924 

0 

72 

0 


1925 

1 

74 

74 


1926 

2 

75 

150 


1927 

3 

64 

192 


1928 

4 

67 

268 


1929 

5 

65 

825 


1930 

6 

62 

372 


1931 

7 

60 

420 


1932 

8 

55 

440 


1933 

9 

53 

447 




SY - 1,466 

M» - 77.1 

- -1,569 

i Sx,* « 570 


We are now in a position to answer the question. Does the 
trend line for Wisconsin or that for the original registration area 














TIME SERIES ANALYSIS 


281 


have the steeper slope? We see that the slope of the Wisconsin 
line is 6 i = —1.89, whereas the slope of the original registration 
area line is 62 = —2.75. The negative signs mean that as x 
increases, i.e,, as time passes, Y, the infant death rate, decreases. 
Evidently, the trend of infant mortality has been decreasing 


2.75 

1.89 


= 1.5 times as fast in the original registration area as in 


Wisconsin. The two lines of trend are plotted in Fig. 55 by 
substituting appropriate values of x in equations (153a) and 
(1536). For example, the ordinate of the line through the 
Wisconsin data is, if xi = —9, Fi = 67 — 1.89 (—9) = 84; and 
if Xi = 9, Fi = 67 — 1.89 (9) = 50; so that the line is drawn 
through the points ( — 9, 84) and (9, 50). 

In terms of percentages, the infant mortality rate declined an 
average of 2.13 per cent per year in Wisconsin, as compared with 
2.56 per cent in the total registration area. 

2. The Secular Trend: A Moving Average.—It is an important 
principle that any line or curve used to represent the secular 
trend of a series should be rather simple in form—a straight 
line if that is at all reasonable, otherwise seldom anything more 
complex than a second degree parabola (F = a + 6 X + cX^). 
The reasons are that a trend line that follows the original data 
too closely includes cyclical variations from which the secular 
trend should be freed, and it also fails to fulfill the primary 
purpose of a trend line, which is to show clearly the general 
direction, up or down, in which the series is moving. 

Of course, a straight line may be a very poor fit for some 
series, so that if we want to generalize the trend without doing 
too much violence to the data we may need to fit another kind of 
curve, say, a parabola. Although the formulas differ, the general 
principles are the same. 

A second method of determining secular trend, which usually 
allows the trend line to follow the original data more closely 
than a straight line does, should be explained. This is the 
method of the moving average, which is shown in Table 80. 
It is again preferable to average an odd number of years, because 
the results-can then be more conveniently centered at a given 
year in the series. If cycles appear in the original series, the 
length of the moving average should be equal to the average 
period of a cycle from peak to peak, or some multiple thereof, if 



282 


ELEMENTARY SOCIAL STATISTICS 


the purpose is to represent the secular trend. But if the moving 
average is used only to smooth out random fluctuations, its 
length should be less than that of an average cycle period. 
The shorter the period of the moving average, the more flexible 
is the resulting curve. Inspection of Fig. 55 suggests the pres¬ 
ence of possible cycles of about seven years in length in both 
series. Accordingly, moving averages of seven years are shown 
in Table 80. 


Table 80.—Seven*-year Moving Averages of Infant Mortality Rates 
IN Wisconsin and in the Original Registration Area of the 
United States, 1915-1933 


Year 

Mortality rates 

Seven-year moving averages 

Wisconsin 

Registration 

area 

Wisconsin 

Registration 

area 

1915 

78 

100 



1916 

86 

100 



1917 

78 

96 



1918 

79 

106 

78 

94 

1919 

79 

89 

77 

91 

1920 

77 

90 

75 

88 

1921 

72 

79 

73 

85 

1922 

70 

79 

71 

80 

1923 

70 

79 

70 

78 

1924 

64 

72 

67 

75 

1925 

67 

74 

66 

73 

1926 

67 

75 

64 

71 

1927 

60 

64 

62 

68 

1928 

61 

67 

61 

67 

1929 

60 

65 

58 

64 

1930 

56 

62 

55 

61 

1931 

53 

60 



1932 

50 

55 



1933 

48 

53 




The method is simply to add the first seven values of the series, 
and divide by 7. Thus, for the Wisconsin series, we have 
(78 + 86 + 78 + 79 + 79 + 77 + 72 = 549)| « 78.4. Then 
the first value in the table, 78, is dropped, and the eighth value, 













TIME SERIES ANALYSIS 


283 


70, is added, and again the sum is divided by 7: 

(549 - 78 + 70)1 = = 77.3 j 

and so on. 

Notice that a disadvantage of the moving average is that it 
reduces the length of the series by one less than the number of 
years averaged, or in this case 7 1 = 6 years. When the 

moving averages are plotted as large dots in Fig. 55, it is 
seen that they give trend lines that agree very closely with the 
straight lines of best fit, especially in the case of the Wisconsin 
data. 

It is helpful in selecting a secular trend to note that “if the 
actual data fall consistently above or below a line of trend for a 
considerable period, it is probable that the fit is not good.”^ 
This is not the case in Fig. 55. 

3. Short-term Cycles.—The cycles in the Wisconsin and 
registration area series may be shown more clearly than in Fig. 55 





Year 


Fio. 56.—Infant mortality rates in Wisconsin and the original registration 
area of the United States, 1915-1933: cyclical deviations from linear trends. 
(From Table 82.) 

by expressing the original rates as percentages of the trend, using 
for the latter either the values lying on the straight lines of best 
fit or the moving averages just found. If we choose the former, 
the results are shown in the last two columns of Table 81. Thus, 
from Table 80, for 1915, we have 78, and from Table 81, 84.01, so 
that 100(78/84.01) = 92.85. Any cyclical tendencies in these 
percentages of trend will stand out even more if we subtract 
100 per cent from each of them, thus expressing them as positive 

^ F. C. Mills, Statistical Methods, p. 290, Henry Holt and Company, Inc., 
New York, 1924. 







284 


ELEMENTARY SOCIAL STATISTICS 


and minus deviations. This is done in Table 82,^ and the 
resulting cyclical deviations are plotted in Fig. 56. 

From Fig. 66 it appears that only short and erratic cycles 
occur in infant mortality rates in Wisconsin and in the original 

Tablb 81.— Infant Mortality Rates in Wisconsin and in the Original 
Registration Area op the United States, 1915-1933 
Straight-line Trend Values and Observed Values as Percentages of the 

Trend Values 


Year 

Linear trend 
values 

Observed rates as 
per cent of trend 

Wisconsin 

Registration 

area 

Wisconsin 

Registration 

area 

1915 

84.01 

101.75 

92.85 

98.28 

1916 

82.12 

99.00 

104.72 

101.01 

1917 

80.23 

96.25 

97.22 

99.74 

1918 

78.34 

93.50 

100.84 

113.37 

1919 

76.45 

90.75 

103.34 

98.07 

1920 

74.56 

88.00 

103.27 

102.27 

1921 

72.67 

85.25 

99.08 

92.67 

1922 

70.78 

82.60 

98.90 

95.76 

1923 

68.89 

79.75 

101.61 

99.06 

1924 

67.00 

77.00 

95.52 

93.61 

1925 

65.11 

74.25 

102.90 

99.66 

1926 

63.22 

i 71.50 

105.98 

104.90 

1927 

61.33 

68.75 

97.83 

93.09 

1928 

59.44 

66.00 

102.62 

101.62 

1929 

67.55 

63.25 

104.26 

102.77 

1930 

55.66 

60.50 

100.61 

102.48 

1931 

63.77 

57.75 

98.57 

103.90 

1932 

1 51.88 

55.00 

96.38 

100.00 

1933 

49.99 

62.25 

96.02 

101.44 


registration area over the period 1916 through 1933. Slightly 
different results would have been obtained if the moving average 
instead of the straight line had been used as the index of trend. 

1 Notice that the first two columns of Table 82 should each sum to zero. 
They fail to do so because we disregarded deoimab in the equations of the 
lines of best fit. 











TIME SERIES ANALYSIS 


285 


Table 82.—Infant Mortality Rates in Wisconsin and the Original 
Registration Area of the United States, lOlS-lQSS 
Percentage Deviations from Straight-line Trends 


Year 

Percentage 

from 

Wisconsin 

(*') 

deviations 

trend 

Registra¬ 
tion area 

(vO 


y'* 

W) 

1915 

-7.15 

- 1.72 

51.12 

2.96 

12.30 

1916 

+4.72 

+ 1.01 

22.28 

1.02 

4.77 

1917 

-2.78 

- 0.26 

7.73 

.07 

0.72 

1918 

+0.84 

+13.37 

.71 

178.76 

11.23 

1919 

+3.34 

- 1.93 

11.16 

3.72 

- 6.45 

1920 

+3.27 

+ 2.27 

10.69 

5.15 

7.42 

1921 

-0.92 

- 7.33 

.85 

53.73 

6.74 

1922 

-1.10 

- 4.24 

1.21 

17.98 

4.66 

1923 

+1.61 

- 0.94 

2.59 

.88 

- 1.51 

1924 

-4.48 

- 6.49 

20.07 

42.12 

29.08 

1925 

+2.90 

- 0.34 

8.41 

.12 

- 0.99 

1926 

+5.98 

+ 4.90 

35.76 

24.01 

29.30 

1927 

-2.17 

- 6.91 

4.71 

47.75 

14.99 

1928 

+2.62 

+ 1.52 

6.86 

2.31 

3.98 

1929 

+4.26 

+ 2.77 

18.15 

7.67 

11.80 

1930 

+0.61 

+ 2.48 

.37 

6.15 

1.51 

1931 

-1.43 

, + 3.90 

2.04 

15.21 

- 5.58 

1932 

-3.62 

0.00 

13.10 

0.00 

0.00 

1933 

-3.98 

+ 1.44 

15.84 

2.07 

- 5.73 

Total. 

+2.52 

+ 3.50 

233.65 

411.68 

118.26 


To compare the amounts of fluctuation of the two series 
around the line of trend, the percentage deviations of Table 82 
are squared and summed, giving for the Wisconsin series, 


<r«' 




233.65 

19 



= 3.51, 


and for the original re^tration area, 





= 4.65. 











286 


ELEMENTARY SOCIAL STATISTICS 


We therefore conclude that the original registration area series is 
1.32 times as variable as the Wisconsin series. Some of this 
difference is due to the abnormal rates of the war year 1918. 
We would expect such a result, as conditions affecting infant 
health are probably more variable over the whole registration 
area than in the single state of Wisconsin. 

4. Correlation between the Short-term Cycles of Two Time 
Series.—Inspection of Fig. 56 shows that infant mortality rates 
tend to rise and fall together in Wisconsin and in the original 
registration area. This resemblance between the apparently 
erratic fluctuations of the two series may be symptomatic of the 
existence of general factors that produce cycles in infant deaths. 
The point is important enough to test with some care. We may 
ask, just how much relationship is there between the variations 
in infant mortality rates in Wisconsin and in the original registra¬ 
tion area? To answer this question we need to know the value 
of the coefficient of correlation between the two time series, 
taking the deviations from the trend lines, as given in Table 82, 
instead of from the means of the series. It will be recalled 
that the formula for the Fearsonian coefficient of correlation is 


r 


Xx'y' - 


Taking the sum of the cross products, Xx'y' = 118.26, from 
Table 82, iV = 19 years from 1915 to 1933 inclusive, o-*^ = 3.50, 
and <r^ = 4.65, as found above, we have 


r = 

r = 


U8.2e-19(^)(^) 

19(3.61)(4.65) - "* 

118.26 - 0.46 _ Q . 
301.11 


r* = 0.15. 


So that the relationship between infant mortality rates in 
Wisconsin and in the original re^tration area from year to 
year enables us to predict one from a knowledge of the other 
only 15 per cent more accurately than if we judged one of the 
series from a knowledge of its own mean and variance. 

Could it be that a correlation coefficient of r = .39 is due to 
random accidental correspondence between the cyclical fluctua- 



TIME SERIES ANALYSIS 


287 


tions in the two series? Although we are dealing here with two 
historical series, we have removed the secular trend, and this is 
sometimes regarded as warrant for applying the standard error 
to this situation. An inspection of the cycles in Fig. 56, how¬ 
ever, suggests that some correlation between successive years 
still remains, so that we can hardly assume that the death 
rates in our series, regarded as a sample, are independent of one 
another. Under these conditions, the basic assumptions of 
simple sampling underlying the standard error fo rmula which is 
appropriate in this case, m., €f = 1/y/N — 1, are violated; 
so we are unable to answer the question asked at the beginning 
of the paragraph. However, the absence of much correlation 
between the Wisconsin rates and the original registration area 
rates suggests that the control of infant mortality is prima¬ 
rily a local problem. This should be further tested by com¬ 
paring infant mortality rates in Wisconsin with those in adjoining 
states. 

It is just as important to avoid the distorting effects of one 
or a few atypical, extreme values in correlating times series as in 
other correlation problems (see Chap. X, Table 49). For exam¬ 
ple, in Fig. 56 it appears that the war year 1918 was decidedly 
abnormal in its infant mortality rate, and the same is to some 
extent true of the depression year, 1933. In those two years 
there is much less agreement than usual between the two series. 
If we are interested primarily in knowing the amount of correla¬ 
tion between infant death rates in Wisconsin and in the registra¬ 
tion area in normal years, it is, of course, desirable to omit the 
two atypical years from the computation of the correlation 
coefficient. This would make it necessary to fit a new trend 
line to the remaining 17 years of the series, and find the coefficient 
of correlation between the percentage deviations from it. In 
case we do not want to confine the investigation of the amount 
of association between the two series to ‘‘normal” years, which 
are not always easy to define objectively, and yet we do want to 
reduce the influence of the extreme or atypical values, it is 
probably advisable to resort to the coefficient of rank correlation. 
This coefficient, p, is calculated from Table 83, and has a value 
of .41. 


, 6SZ)2 _ - 6(678) 

NiN^ - l) “ 19(19* - 1) 


. 41 . 



288 


ELEMENTARY SOCIAL STATISTICS 


Table 83.—Rank of Obsebtbd Rates as Peb Cent of Tbend 


Year 

Wisconsin 

Registration 

area 

D 


1915 

1 

6 

~ 5 

25 

1916 

18 

11 

7 

49 

1917 

5 

9 

- 4 

16 

1918 

11 

19 

- 8 

64 

1919 

16 

5 

11 

121 

1920 

15 

14 

1 

1 

1921 

9 

1 

8 

64 

1922 

8 

4 

4 

16 

1923 

12 

7 

5 

25 

1924 

2 

3 

- 1 

1 

1925 

14 

8 

6 

36 

1926 

19 

18 

1 

1 

1927 

6 

2 

4 

16 

1928 

13 

13 

0 

0 

1929 

17 

16 

1 

1 

1930 

10 

15 

5 

25 

1931 

7 

17 

-10 

100 

1932 

4 

10 

- 6 

36 

1933 

3 

12 

9 

81 

Total. 




678 


As expected, the result of using ranks in this case is to increase 
the amount of correlation somewhat. 

It often happens that the correlation of two time series is 
greater if one of them is lagged one or more years, so that the 
cycles correspond more closely. For example, if the marriage 
rate declines sharply, so does the birth rate, but not until about 
a year later. Therefore, to test the relationship between mar¬ 
riage and birth rates, the latter should be lagged by one year. 
That is, say, the 1930 birth rate should he paired with the 1929 
marriage rate, etc. There is no indication that a lag is needed 
in correlating the two series with which we were dealing above. 

6. Seasonal Fluctuations.—Data such as infant mortality 
rates may be obtained by months as well as by years. This 
affords an opportunity to study the seasonal fluctuations in 
infant deaths, t.e., the variations in death rates that are associ- 














TIME SERIES ANALYSIS 


289 


Table 84.-^Inpant Mortality Rates by Months, United States 
Registration Area, 1928-1935* 




Infant 



Monthly 





mortality 

Monthly 

trend 

rates 

Observed 

averages 



Year 

Month 

rate per 
l,0001ive 
births in 

rate as 
per cent 
of trend 

of observed 
rates as 
per cent 

Seasonal 

index 

Cycles 

(4)-(6) 



same 


(2)+ (3) X100 

of trend 





month 


(Table 87) 



do) 

(16) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

1928 

Jan. 

72.4 

69.24 

104.56 

113.34 

113.35 

- 8.79 


Feb. 

73.2 

69.09 

105.95 

112.19 

112.20 

- 6.25 


Mar. 

74.8 

68.93 

108.52 

108.55 

108.56 

- .04 


Apr. 

May 

75.0 

68.78 

109.04 


103.39 

+ 5.65 


70.4 

68.62 

102.59 

97.48 

97.49 

+ 5.10 


June 

64.2 

68.47 

93.76 

93.59 

93.60 

+ .16 


July 

60.8 

68.31 

89.01 


90.10 

- 1.09 


Aug. 

60.2 

68.16 

88.32 

87.03 

87.04 

+ 1.28 


Sept. 

63.4 

68.00 

93.24 

91.20 

91.21 

+ 2.03 


Oct. 

64.3 

67.84 

94.78 

97.09 

97.10 

- 2.32 


Nov. 

65.2 

67.69 

96.32 

97.83 

97.84 

- 1.52 


Deo. 

81.3 

67.53 

120.39 

108.07 

108.08 

+ 12.31 

1929 

Jan 

99.1 

67.39 

147.05 

113.34 

113.35 

+33.70 


Feb. 

84.8 

67.22 

126.15 

112.19 

112.20 

+ 13.95 


Mar. 

74.3 


110.78 

108.55 

108.56 

+ 2.22 


Apr. 

May 

66.1 

66.91 

98.79 

103.38 

103.39 

- 4.60 


63.9 

66.76 

95.72 

97.48 

97.49 

- 1.77 


June 

57.8 

66.60 

86.79 

93.59 

93.60 

- 6.81 


July 

55.7 

66.45 

83.82 

90.09 

90.10 

- 6.28 


Aug. 

57.7 

66.29 

87.04 

87.03 

87.04 

0.00 


Sept. 

63.4 

66.13 

95.87 

91.20 

91.21 

+ 4.66 


Oct. 

64.9 

65.98 

98.36 

97.09 

97.10 

+ 1.26 


Nov. 

59.4 

65.82 

90.25 

97.83 

97.84 

- 7.59 


Dec. 

65.2 

65.67 

99.28 

108.07 

108.08 

- 8.80 

1930 

Jan. 

67.8 

65.51 

103.50 

113.34 

113.35 

- 9.85 


Feb. 

69.8 

65.36 

106.79 

112.19 

112.20 

- 5.41 


Mar. 

69.3 

65.20 

106.29 

108.55 

108.56 

- 2.27 


Apr. 

May 

68.2 

65.05 

104.84 

103.38 

103.39 

+ 1.45 


62.5 

64.89 

96.32 

97.48 

97.49 

- 1.17 


June 

61.4 

64.74 

94.84 

93.59 

93.60 

+ 1.24 


July 

59.3 

64.58 

91.82 

90.09 

90.10 

+ 1.72 


Aug. 

56.0 

64.43 

86.92 

87.03 

87.04 

- .12 


Sept. 

61.7 

64.27 

96.00 

91.20 

91.21 

+ 4.79 


Oct. 

67.1 

64.12 

1<)4.65 

97.09 

97.10 

+ 7.55 


Nov. 

63.5 

63.96 

99.28 

97.83 

97.84 

+ 1.44 


Dec. 

69.8 

63.81 

109.39 

108.07 

108.08 

+ 1.31 

1931 

Jan. 

75.3 

63.65 

118.30 

113.34 

113.35 

+ 4.95 


Feb. 

74.6 

63.49 

117.50 

112.19 

112.20 

+ 5.30 


Mar. 

70.4 

63.34 

111.15 

108.55 

108.56 

+ 2.59 


Apr. 

May 

65.7 

63.18 

103.99 

103.38 

103.39 

+ .60 


56.4 

63.03 

89.48 

97.48 

97.49 

- 8.01 


June 

53.5 

62.87 

84.94 

93.59 

93.60 

- 8.66 


July 

54.1 

62.72 

86.26 

90.09 

90.10 

- 3.84 


Aug. 

54.3 

62.56 

86.80 

87.03 

87.04 

- .24 


Sept. 

58.5 

62.41 

93.73 

91.20 

91.21 

+ 2.52 


Oct. 


62.25 

97.99 

97.09 

97.10 

+ .89 


Nov. 

58.1 

62.10 

93.56 

97.83 

97.84 

- 4.28 


Dec. 

57.3 

61.94 

92.51 

108.07 

108.08 

-15.57 

1932 

Jan. 

56.6 

61.79 

91.44 

113.34 

113.35 

-21.91 


Feb. 

67.6 

61.63 

93.30 

112.19 

112.20 

-18.90 


Mar. 

62.8 

61.48 

102.15 


108.56 

- 6.41 


Apr. 

May 


61.32 

97.85 

103.38 


- 5.54 


67.8 

61.17 

94.49 

97.48 

97.49 

- 3.00 


June 

56.1 

61.01 

91.95 

93.59 

93.60 

- 1.65 


July 

65.2 

60.85 



90.10 

+ .61 


Aug. ^ 

50.9 


83.86 

87.03 

87.04 

- 3.18 


Sept. 

49.7 

60.54 


91.20 


- 9.12 


Oot. 

52.5 

60.39 

86.93 

97.09 

97.10 

-10.17 


Nov. 

60.4 

60.23 

100.28 

97.83 

97.84 

+ 2.44 


Deo. 

73.0 

60.08 

121.50 



+ 13.42 


* From Birtht, StUlhirth^, and Jh^ani Mortality, U. S. Bureau of the Census, annual 
publication. 












































290 


ELEMENTARY SOCIAL STATISTICS 


Tabus 84 . —^Infant Mortautt Rates bt Months, Unites States 
Registration Area, 1928-1935.* — (Continued) 


Ymt 

(1«) 

Month 

(IW 

Infant 
mortality 
rate per 
1.000 live 
births in 
same 
month 

(2) 

Monthly 

trend 

rates 

(3) 

Observed 
rate as 
per cent 
of trend 
(2)+ (3) X100 

(4) 

Monthly 
averages 
of observed 
rates as 
per cent 
of trend 
(Table 87) 

(5) 

Seasonal 

index 

(6) 

Cycles 

(4)-(6) 

(7) 

1933 

Jan. 

71.2 

59.92 

118.83 

113.34 

113.35 

+ 

5.48 


Feb. 

69.9 

69.77 

116.95 

112.19 

112.20 


4.75 


Mar. 

60.1 

59.61 

100.82 

108.55 

108.56 


7.74 



56.3 

59.46 

94.69 

103.38 

103.39 

— 

8.70 



54.7 

59.30 

92.24 

97.48 

97.49 


6.26 


June 

56.2 

59.15 

95.01 

93.59 

93.60 

4- 

1.41 


July 

51.9 

68.99 

87.98 


90.10 


2.12 


Aug. 

50.1 

58.84 

85.15 


87.04 

— 

1.89 


Sept. 

54.9 

58.68 

93.56 


91.21 

+ 

2.35 


Oct. 

68.7 

58.52 

100.31 


97.10 

+ 

3.21 


Nov. 

58.0 

58.37 

99.37 

97.83 

97.84 

4- 

1.53 


Dec. 

68.5 

58.21 

100.50 

108.07 

108.08 


7.58 

1034 

Jan. 

60.6 

58.06 

104.37 

113.34 

113.35 

_ 

8.08 


Feb. 

66.5 

57.90 

114.85 

112.19 

112.20 

4* 

2.65 


Mar. 

67.7 

57.75 

117.23 

108.55 

108.56 

4* 

8.67 


Apr. 

64.9 

57.69 

112.69 

103.38 

103.39 

4- 

9.30 


May 

60.8 

57.44 

105.85 

97.48 

97.49 

4- 

8.36 


sJune 

60.2 

57.28 

105.10 

93.59 

93.60 

4-11.50 


July 

58.3 

57.13 

102.05 

90.09 

90.10 

4*11.96 


Aug. 

52.2 

56.97 

91.63 

87.03 

87.04 

4* 

4.59 


Sept. 

51.6 

66.82 

90.81 

91.20 

91.21 


.40 


Oct. 

67.0 

56.66 

100.60 

97.09 

97.10 

+ 

3.50 


Nov. 

69.3 

56.51 

104.94 

97.83 

97.84 

-f 

7.10 


Deo. 

63.8 

56.35 

113.22 

108.07 

108.08 

+ 

5.14 

1935 

Jan. 

66.7 

66.19 

118.70 

113.34 

113.35 

+ 

5.35 


Feb. 


66.04 

115.99 

112.19 

112.20 

+ 

3.79 


Mar. 

62.3 

55.88 

111.49 

108.55 

108.56 

4- 

2.93 


Apr. 

68.6 

55.73 

105.15 

103.38 

103.39 

4- 

1.76 


May 

57.3 

55.57 

103.11 

97.48 

97.49 

4- 

5.62 


June 

53.4 

55.42 

96.36 

93.59 

93.60 

4- 

2.76 


July 

49.2 

65.26 


90.09 

90.10 

4- 

1.07 


Aug. 

47.7 

56.11 

86.55 

87.03 

87.04 


.49 


Sept. 

46.3 

54.95 

84.26 

91.20 

01.21 


6.95 


Oct. 


64.80 

93.07 

97.09 

97.10 

— 

4.03 


Nov. 

53.9 

54.64 

98.65 

97.83 

97.84 

4- 

.81 


Dec. 

58.7 

54.49 

107.73 

108.07 

108.08 


.35 


*From Births, Stillbirths, and Infant Mortality, U. B. Bureau of the Census, annual 
publication. 


ated with spring, summer, fall, and winter. To do this, we must 
first separate the seasonal fluctuations from the secular trend, 
the short-term cycles, and the random fluctuations, all of which 
appear in the original monthly rates given in col. (2) of Table 84. 
We average the 12 monthly rates in each year in Table 84 to 
obtain annual rates, which are entered in Table 85, and plotted 
in Fig. 67. Inspection of Fig. 57 shows a decline in the infant 
mortality rate in five out of seven years, and suggests that a 
straight line probably is most appropriate to represent the 
secular trend. Table 85 shows the calculations needed to fit a 
linear trend to the annual rates by the method of least squares. 


































TIME SERIES ANALYSIS 


291 


70 

65 

0 > 

*560 


0 

1928 1929 1930 1931 1932 1933 1934 1935 
Year 

Fia. 67.—Annual infant mortality rates in the registration area of the United 
States, 1928-1933. (From Table 85.) 



Table 85.— Values Needed for Fitting a Straight Line to the Annual 
Infant Mortality Rates in the Registration Area op the 
United States, 1928-1935 


Year 

Infant 
death 
rate (F) 

Year 

code 

(X') 

X*Y 

x>' 

Trend 

values 

Deviations 
from trend 

1928 

68.767 

-3 

-206.301 

9 

68.388 

+ .379 

1929 

67.692 

-2 

-135.384 

4 

66.524 

4-1.168 

1930 

64.700 

-1 

- 64.700 

1 

64.660 

4- .040 

1931 

61.592 


0.000 

0 

62.796 

-1.204 

1932 

57.700 

4-1 

54.700 

1 


-3.232 

1933 

58.375 

+2 

116.750 

4 


- .693 

1934 

60.242 

4-3 

180.726 

9 


4-3.038 

1935 

55.842 

+4 

223.368 

El 


4- .502 

Total. 

Mean. 

494.910 

61.864 

0.5 

169.159 

44 


- .002 


Substituting the values found in Table 85 in the normal equa¬ 
tions for determining the constants in the equation of a straight 
line, we have 

__ 169.159 - 8(0.5)(61.864) 

" 44 - 8(0.25) * 

h = -1.864, 

a = 61.864 - (-1.864) (0.5), 

' a = 62.796, 

BO that 

r. = o + hX', 
r, = 62.796 - 1.864X'. 


( 164 ) 
















292 


ELEMENTARY SOCIAL STATISTICS 


From formula (154) the trend values shown in the next to the 
last column of Table 85 are estimated by substituting for X' its 
successive values taken from the third column of the table. The 
annual trend line is plotted in Fig. 57. The last column of 
Table 85, obtained by subtracting the trend values from the 
observed Y values, is inserted as a check on the arithmetic. 
Its sum is approximately zero, as it should be if the calculations 
are carried far enough. 



States, 1928-1936. (From Table 84.) 


Since in each year the infant death rate declines on the 
average 1.864, in one month the decline is 1.864/12 = 0.1553. In 
Table 85 we used average annual rates, which apply to the 
middle of a year. The middle of the year falls on June 30. 
The average monthly rates, however, apply to the middle of 
each month. We, therefore, enter Table 84 at June, 1928, and 
add to the annual 1928 trend rate of 68.388 one-half of the 
correction factor, 0.1553, so that we have 

68.388 + .0777 = 68.4657 , 

as the June, 1928, monthly trend in col. (3) of Table 84. We 
then add 0.1553 accumulatively to this rate for the five preceding 
months in 1928, and subtract 0.1553 accumulatively from it for 
each subsequent month throughout the eight-year period, which 




TIME SERIES ANALYSIS 


293 


completes col. (3). The monthly trend line, which is identical 
with the annual trend line, and the observed monthly rates from 
col. (2) of Table 84, are plotted in Fig. 58. From this graph, it is 
seen that in spite of a general downward trend in infant mortality 
rates, these rates have fluctuated considerably, so that even in, 
say, early 1935 they were much higher than in the middle of 1928. 
How much of this variation is due to the season of the year? 


Table 86.—Frequency Distribution op Observed Infant Mortality 
Rates Expressed as Percentages op Trend, by Months, United 
States Registration Area, 1928-1935* 


Observed 
rates, per cent 
of trend 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

145-149 

/ 












140-144 

■ 




j 








135-139 

■ 











1 

130-134 













125-129 


/ 











120-124 












// 

115-119 

/// 

/// 

/ 










110-114 


/ 

/// 

/ 








/ 

105-109 


// 

// 

// 

/ 

/ 






// 

100-104 

/// 1 


// 

// 

// 


/ 

i 


/// 

// 

/ 

95- 99 



■ 

// 

// 

// 



// 

B 


/ 

90- 94 

/ 

/ 

B 

D 

u 

/// 

// 

/ 

//// 

B 


/ 

85- 89 

■ 

■ 

■ 

■ 

B 

/ 

//// 



/ 

1 

B 

80- 84 

! 

! 

! 

! 

i 

/ 

/ 

/ 

// 

B 

B 

■ 


♦ From ct)l, (4), Table 84. 


Column (4) of Table 84 shows the monthly observed rates 
expressed as percentages of the monthly secular trend rates. 
These percentages represent the seasonal variations combined 
with the short-term cycles and the random fluctuations, but 
with the secular trend eliminated. To remove the short-term 













































294 


ELEMENTARY SOCIAL STATISTICS 


cycles and random fluctuations, it is necessary to average the 
percentages for each month, over the eight-year period. As an 
aid in revealing whether or not a seasonal movement actually 
exists, and in choosing the most stable kind of monthly average. 
Table 86 is set up. A glance at it shows clearly the presence of a 
seasonal pattern in infant mortality. The death rate is high in 
the winter and low in the late summer. From the arrangement 
of the frequencies in the several columns, it appears that, except 
possibly in January, the arithmetic mean is a suitable average 
to use in this case. As a rule, however, it is recommended to 



Fig. 59.—Seasonal indexes of infant mortality rates in the registration area of the 
United States, 1928-1936. (From Table 87.) 

average the middle three or four values for each month, a sort 
of combined mean and median average which avoids the dis¬ 
tortion due to extreme values. In Table 87 the mean monthly 
values are found^ and are entered in col. (6) of Table 84. To 
convert the 12 mean monthly percentages to index numbers, 
they are divided by their own average, 99.99, and the quotient 
multiplied by 100, to give the last row of Table 87 and col. (6) 
of Table 84. The index of seasonal variation has an advantage 
over the simple percentages of col. (6) of Table 84, in that they 
vary around a mean of exactly 100.00 per cent, and are therefore 
more generally comparable and finished in form. In the seasonal 
indexes of col. (6) of Table 84 there now remains only the seasonal 
variation, since the secular trend, cycles, and random fluctua¬ 
tions were removed by the steps just taken. An undistorted 
< * From col. (4) of Table 84. 




TIME SERIES ANALYSIS 


295 


idea of the seasonal variation can now be obtained by plotting 
the monthly seasonal indexes around their mean of 100 per cent, 
as in Fig. 59. It is again obvious that the winter months are 
the danger period for infants. 

6. Short-term Cycles Freed from Seasonal Fluctuations.—If 

it is wanted to observe the short-time cycles mixed with random 
fluctuations in the monthly infant mortality rates, freed from 



1928 1929 1930 1931 1932 1933 1934 1935 


Year 

Fia. 60.—Short-term cycles and random fluctuations in infant mortality rates, 
United States registration area, 1928-1935. (From Table 84.) 

both the secular trend and the seasonal movement, this may be 
done by recording in col. (7) of Table 84 the differences between 
the percentages of trend in col. (4) and the seasonal indexes in 
col. (6), and plotting them in Fig. 60. It appears that a number 
of other factors besides the season of the year affect the infant 
death rate, and need to be studied and brought under control. 
There is no suggestion from Fig. 60 that any progress was made 
during the eight-year period in reducing the percentage of 
infant deaths due to cyclical and random causes. The point 
might be tested by obtaining the standard deviations around 
zero of the differences in col. (7) of Table 84, for the first two 
years and the last two years of the period, and comparing the 
two standard deviations. 



296 ELEMENTARY SOCIAL STATISTICS 


Table 87. —Calculation of Monthly Means of Infant Mortality 
Rates Expressed as Percentages of Trend, United States 
Registration Area, 1928-1935 


Year 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1928 

104.50 

105.95 

108.52 


102.59 

93.76 

89.01 

88.32 

93.24 

94.78 

96.32 

120.30 

1929 

147.05 

126.15 

110.78 

98.79 

95.72 

86.79 

83.82 

87.04 

95.87 

98.36 

90.25 

99.28 

1930 

103.50 

106.79 

106.29 

104.84 

96.32 

94.84 

91.82 

86.92 


lOIHiM 

99.28 


1931 

118.30 


111.15 

[EE] 

89.48 

84.94 

86.26 

86.80 

93.73 

97.99 

93.56 

92.51 

1932 

91.44 


102.15 

97.85 

94 49 

91.95 

90.71 

83.86 


86.93 


121.50 

1933 

118.83 

116.95 


94.69 

92.24 

95.01 

87.98 

85.15 

03.56 


99.37 

iOHII.Hl 

1934 

ihzmU 

114.85 

117.23 

112.69 

105.85 

105.10 


91.63 



B2E] 

113.22 

1935 

118.70 

115.99 

111.49 

105.15 

103.11 

96.36 


86.55 

84.26 


98.65 


Total. 


897.48 

868.43 

827.04 

779.80 

748.75 

720.68 

696.27 

729.56 

776.69 

782.65 

864.52 

Mean. 

113.34 

112.19 

108.55 


97.48 

93.59 

90.09 


91.20 

97.09 

97.83 

io!:fiVJ 

Index... r. 

113.35 

112.20 

108.56 

103.39 

97.49 

93.60 

90.10 

87.04 

91.21 

97.10 

97.84 

108.08 


Exercises 

1 . Compare the trends in the birth rates of cities and of rural areas 
in the original registration area of the United States over the 19-year 
period, 1915 through 1933, using the data in the table below. Show 
the cyclical deviations from trend, compare the variability of the two 
series, and calculate the amount of correlation between the fluctuations 
of the two series. What should be done with the data for extremely 
atypical years, such as the war year, 1918? Is the correlation improved 
by lagging’^ one of the series? Plot all data. 


Birth Rates per 1,000 Population for Cities and Rural Areas in the 
Original Registration Area of the United States, 1915-1933* 


Year 

Birth rate 

Year 

Birth rate 

Cities 

Rural 

Cities 

Rural 

1915 

26.0 

23.8 


22.2 

20.3 

1916 

26.0 

23.5 

■9 

21.5 

19.1 

1917 

26.4 

23.3 

im 

21.3 

19.0 

1918 

25.8 

23.0 


20.5 

18.1 

1919 

23.8 

21.1 

■9 

19.7 

16.8 

1920 

24.6 

22.2 

ISI 

19.3 

16.7 

1921 

24.5 

23.1 

mm 

ir.8 

16.3 

1922 

22.9 

21.8 

mSm 

17.0 

15.5 

1923 

1924 

22.9 

23.2 

21.1 

21.2 

■ 

15.8 

14.8 


*Froin Bvrih, SiHXbMh, and Infant Mortality SiaiitticOt 1936, pp. 6-6, Bureau of the 



































































TIME SERIES ANALYSIS 


297 


2. For the relief data in the accompanying table show the secular 
trend and the seasonal fluctuations, and plot the results in each case. 


Number or Cases RECEnriNO Relief in 385 Rural and Town Areas of 
THE United States, 1932-1936* 


Month 

Cases 

1932 

1933 

1934 

1935 

1936 

January. 

30,931 

99,064 

169,554 

298,785 

145,734 

February. 

32,552 

107,860 

177,041 

299,217 

146,697 

March. 

34,239 

128,794 

202,551 

290,217 

143,000 

April. 

32,965 

121,234 

216,463 

279,901 

131,038 

May. 

30,713 

112,079 

222,647 


123,102 

June. 

30,774 

110,158 

232,331 


117,808 

July. 

29,687 

131,850 

239,441 


120,067 

August. 

30,214 

126,572 

259,410 

218,883 

128,303 

September. 

33,561 

114,147 

255,929 


129,124 

October. 

38,126 

117,459 

251,397 

ESkSI 

144,492 

November. 

65,922 

135,234 


198,780 

149,781 

December. 

75,517 

115,877 

282,068 

167,297 

166,173 


* Adapted from Waller Wynne, Jr., Five Years of Rural Relief, p. 36, WPA, Division of 
Social Research, 1038. 


References 

Craddock, R. E.: Principles and Methods of Statistics^ Chap. XIII, Hough¬ 
ton Mifflin Company, Boston, 1925. 

Croxton, F. E., and D. J. Cowdbn: Applied General Statistics, Chaps. XIV- 
XIX and XXV, Prentice-Hall, Inc., New York, 1939. 

Davies, G. R., and Dale Yoder: Business StaJtistics, Chaps. IV and V, 
John Wiley & Sons, Inc., New York, 1937. 

Mills, F. C.: Statistical Methods, rev. ed., Chaps. VII, VIII, and XI, Henry 
Holt and Company, Inc., New York, 1938. 

Waugh, A. E.: Elements of Statistical Method, Chap. VIII, McGraw-Hill 
Book Company, Inc., New York, 1938. 

White, R. C.: Social Statistics, Chap. XIII, Harper & Brothers, New York, 
1933. 

































Appendix 

Table 1.—Area and Ordinate or the Normal Curve^ 


x/v 

Area 

Ordinate (y) 

x/0 

Area 

Ordinate (y) 

.00 

.0000000 

.3989423 

.46 

.1772419 

.3588903 

.01 

.02 

.03 

.04 

.0039894 

.0079783 

.0119665 

.0159534 

.3989223 

.3988626 

.3987628 

.3986233 

!47 

.48 

.49 

.1808225 

.1843863 

.1879331 

.3672253 

.3655326 

.3538124 

.05 

.0199388 

.3984439 

.50 

.1914625 

.3520653 

.06 

.0239222 

.3982248 

.51 

.1949743 

.3502919 

.07 

.0279032 

.3979661 

.52 

.1984682 

.3484925 

.08 

.0318814 

.3976677 

.53 

.2019440 

.3466677 

.09 

.0358664 

.3973298 

.54 

.2054015 

.3448180 

.10 

.0398278 

.3969525 

.55 

.2088403 

.3429439 

.11 

.0437953 

.3966360 

.56 

.2122603 

.3410458 

.12 

.0477584 

.3960802 

.57 

.2156612 

.3391243 

.13 

.0517168 

.3955854 

.58 

.2190427 

.3371799 

.14 

.0556700 

.3950517 

.59 

.2224047 

.3362132 

.15 

.0596177 

.3944793 

.60 

.2257469 

.3332246 

.16 

.0635596 

.3938684 

.61 

.2290691 

.3312147 

.17 

.0674949 

.3932190 

.62 

.2323711 

.3291840 

.18 

.0714237 

.3925315 

.63 

.2356527 

.3271330 

.19 

.0753454 

.3918060 

.64 

.2389137 

.3250623 

.20 

.0792597 

.3910427 

.65 

.2421539 

.3229724 

.21 

.0831662 

.3902419 

.66 

.2453731 

.3208638 

.22 

.0870644 

.3894038 

.67 

.2485711 

.3187371 

.23 

.0909541 

.3885286 

.68 

.2617478 

.3165929 

.24 

.0948349 

.3876166 

.69 

.2549029 

.3144317 

.25 

.0987003 

.3866681 

.70 

.2580363 

.3122539 

.2^ 

. ^25681 

.3856834 

.7i. 

.2611479 

.3100603 

.27 

.1064199 

.3846627 

.72 

.2642375 

.3078513 

.28 

.1102612 

.3836063 

.73 

.2673049 

.3056274 

.29 

.1140919 

.3825146 

.74 

.2703500 

.3033893 

.30 

.1179114 

.3813878 

.75 

.2733726 

.3011374 

.31 

.1217195 

.3802264 

.76 

.2763727 

.2988724 

.32 

.1255158 

.3790305 

.77 

.2793501 

.2965948 

.33. 

jj«^Q3nnrt_ 

.3778007 

.78 

.2823046 

.2943060 

.34 

.1330717 

.3765372 

.79 

.2852361 

.2920038 

.35 

.1368307 

.3752403 

.80 

.2881446 

.2896916 

.36 

.1405764 

.3739106 

.81 

.2910299 

.2873689 

.37 

.1443088 

.3725483 

.82 

.2938919 

.2850364 

.38 

.1480273 

.3711539 

.83 

.2967306 

.2826945 

.39 

.1517317 

.3697277 

.84 

.2995458 

.2803438 

.40 

.1555417 

.3682707 

.85 

.3023375 

.2779849 

.41 ' 

.1590970 

.3667817 

.86 

.3051055 

.2756182 

.42 

.1627573 

.3652627 

.87 ! 

.3078498 

.2732444 

.43 

.1664022 

.3637136 

.88 

.3105703 

.2708640 

.44 

.1700314 

.3621349 

.89 

.3132671 

.2684774 

.45 

.1736448 

.3605270 

.90 

.3159399 

.2660852 


^ From Kent, **The Elemente of SUtietioe." 


299 










300 


ELEMENTARY SOCIAL STATISTICS 


Tabls 1.—Area and Ordinate of the Normal Curve.^— {Continued) 


X/9 

Area 

Ordinate (y) 

x/c 

Area 

Ordinate (y) 

.91 

.3185887 

.2636880 

1.36 

.4130850 

.1582248 

.92 

.3212136 

.2612863 

1.37 

.4146566 

.1560797 

.93 

.3238145 

.2588805 

1.38 

.4162067 

.1539483 

.94 

.3263912 

.2564713 

1.39 

.4177356 

.1518308 

.93 

.3289439 

.2540691 

1.40 

.4192433 

.1497276 

.96 

.3314724 

.2616443 

1.41 

.4207302 

.1476385 

.97 

.3339768 

.2492277 

1.42 

.4221062 

.1455641 

.98 

.3364569 

.2468096 

1.43 

.4236415 

.1435046 

.99 

.3389129 

.2443904 

1.44 

.4250663 

.1414600 

1.00 

.3413447 

.2419707 

1.46 

.4264707 

.1394306 

1.01 

.3437524 

.2396611 

1.46 

.4278660 

.1374165 

1.02 

.3461358 

.2371320 

1.47 

.4292191 

.1354181 

1.03 

.3484960 

.2347138 

1.48 

.4305634 

.1334353 

1.04 

.3508300 

.2322970 

1.49 

.4318879 

.1314684 

1.05 

.3531409 

.2298821 

1.50 

.4331928 

.1296176 

1.06 

.3664277 

.2274696 

1.61 

.4344783 

.1276830 

1.07 

.3576903 

.2260699 

1.52 

.4357445 

.1256646 

1.08 

.3599289 

.2226635 

1.53 

.4369916 

.1237628 

1.09 

.3621434 

.2202608 

1.54 

.4382198 

.1218775 

1.10 

.3643339 

.2178622 

1.55 

.4394292 

.1200090 

1.11 

.3666006 

,2164682 

1.56 

.4406201 

.1181673 

1.12 

.3686431 

.2130091 

1.57 

.4417924 

.1163226 

1.13 

.3707619 

,2106856 

1.68 

.4429466 

.1145048 

1.14 

.3728568 

.2083078 

1.59 

.4440826 

.1127042 

1.15 

.3749281 

.2059363 

1.60 

.4452007 1 

.1109208 

1.16 

.3769756 

.2036714 

1.61 

.4463011 

.1091548 

1.17 

.3789995 

.2012135 

1.62 

.4473839 

.1074061 

1.18 

.3809999 

.1988631 

1.63 

.4484493 

.1056748 

1.19 

.3829768 

.1965205 

1.64 

.4494974 

.1039611 

1.20 

.3849303 

.1941861 

1.65 

.4505285 

.1022649 

1.21 

.3868606 

.1918602 

1.66 

.4516428 

.1006864 

1.22 

.8887676 

.1895432 

1.67 

.4525403 

.0989255 

1.23 

.3906514 

.1872354 

1.68 

.4535213 

.0972823 

1.24 

.8925123 

.1849373 

1.69 

.4544860 

.0956568 

1.25 

.3943502 

.1826491 

1.70 

.4564345 j 

.0940491 

1.26 

.3961653 

.1803712 

1.71 

.4663671 

.0924591 

1.27 

.3979577 

.1781038 

1.72 

.4572838 

.0908870 

1.28 

.8997274 

.1758474 

1.73 i 

.4581849 

.0893326 

1.29 

.4014747 

.1736022 

1.74 

.4590705 

.0877961 

1.80 

.4031995 

.1713686 

1.75 

.4599408 

.0862773 

1.31 

.4049021 

.1691468 

1.76 

.4607961 

.0847764 

1.82 

.4065825 

.1669370 

1.77 

.4616364 

.0832932 

1.83 

.4082409 

.1647397 

1.78 

.4624620 

.0818278 

1.84 

.4098773 

.1625551 

1.79 

.4632730 

.0803801 

^ 1.85 

.4114920 

.1603833 

1.80 

.4640697 

.0789502 


i Fron K«nt| “Th« Blemeiita of Statiaiics.'* 










APPENDIX 


301 


Table 1.—Abba and Ordinate of the Normal Curve.*— {Contimted) 


»/<r 

Area 

Ordinate (y) 

x/9 

Area 

Ordinate (y) 

1.81 

.4648521 

.0775379 

2.26 

.4880894 

.0310319 

1.82 

.4656205 

.0761433 

2.27 

.4883962 

.0303370 

1.83 

.4663760 

.0747663 

2.28 

.4886962 

.0296546 

1.84 

.4671159 

.0734068 

2.29 

.4889893 

.0289847 

1.85 

.4678432 

.0720649 

2.30 

.4892759 

.0283270 

1.86 

.4685572 

.0707404 

2.31 

.4896659 

.0276816 

1.87 

.4692581 

.0694333 

2.32 

.4898296 

.0270481 

1.88 

.4699460 

.0681438 

2.33 

.4900969 

.0264265 

1.89 

.4706210 

.0668711 

2.34 

.4903581 

.0258166 

1.90 

.4712834 

.0656158 

2.35 

.4906133 

.0252182 

1.91 

.4719334 

.0643777 

2.36 

.4908625 

.0246313 

1.92 

.4725711 

.0631568 

2.37 

.4911060 

.0240556 

1.93 

.4731966 

.0619524 

2.38 

.4913437 

.0234910 

1.94 

.4738102 

.0607652 

2.39 

.4915768 

.0229374 

1.95 

.4744119 

.0596947 

2.40 

.4918025 

.0223946 

1.96 

.4750021 

.0584409 

2.41 

.4920237 

.0218624 

1.97 

.4765808 

.0573038 

2.42 

.4922397 

.0213407 

1.98 

.4761482 

.0661831 

2.43 

.4924506 

.0208294 

1.99 

.4767045 

.0560789 

2.44 

.4926564 

.0203284 

2.00 

,4772499 

.0539910 

2.45 

.4928572 

.0198374 

2.01 

.4777844 

.0529192 

2.46 

.4930531 

.0193563 

2.02 

.4783083 

.0618636 

2.47 

.4932443 

.0188850 

2.03 

.4788217 

.0608239 

2.48 

.4934309 

.0184233 

2.04 

.4793248 

.0498001 

2.49 

.4936128 

.0179711 

2.05 

.4798178 

.0487920 

2.50 

.4937903 

.0175283 

2.06 

.4803007 

.0477998 

2.51 

.4939634 

.0170947 

2.07 

.4807738 

.0468226 

2.62 

.4941323 

.0166701 

2.08 

.4812372 

.0458611 

2.53 

.4943001 

.0162462 

2.09 

.4816911 

.0449143 

2.54 

.4944574 

.0158476 

2.10 

.4821366 

.0439836 

2.55 

.4946139 

.0154493 

2.11 

.4825708 

.0430674 

2.56 

.4947664 

.0150596 

2.12 

.4829970 

.0421661 

2.67 

.4949151 

.0146782 

2.13 

.4834142 

.0412795 

2.58 

.4960600 

.0143061 

2.14 

.4838220 

.0404076 

2.59 

.4952012 

.0139401 

2.15 

.4842224 

.0395500 

2.60 

.4953388 

1 

.0135830 

2.16 

.4846137 

.0387069 

2.61 

.4954729 

.0132337 

2.17 

.4849966 

.0378779 

2.62 

.4956035 

.0128921 

2.18 

.4853713 

.0370629 

2.63 

.4957308 

.0125581 

2.19 

.4857379 

.0362619 

2.64 

.4958547 

.0122315 

2.20 

.4860966 

.0354746 

2.65 

.4959754 

.0119122 

2.21 

.4864474 

.0347009 

2.66 

.4960930 

.0116001 

2.22 

.4867906 

.0339408 

2.67 

.4962074 

.0112951 

2.23 

.4871263 

.0331939 

2.68 

.4963189 

.0109969 

2.24 

.4874545 

.0324603 

2.69 

.4964274 

.0107056 

2.25 

.4877755 

.0317397 

2.70 

.4966330 

.0104209 


I EVom Ksnt> “Hie Elements of Stetietice." 












802 ELEMENTARY SOCIAL STATISTICS 


Tablb 1.—Area and Ordinate of the Normal Curve.*— {Coniinued) 


x/9 

Area 

1 - 

Ordinate (y) 

x/9 

Area 

Ordinate (y) 

2.71 

.4966358 

.0101428 

3.16 

.4992112 

.0027076 

2.72 

.4967369 

.0098712 

3.17 

.4992378 

.0026231 

2.73 

.4968333 

.0096068 

3.18 

.4992636 

.0026412 

2.74 

.4969280 

.0093466 

3.19 

.4992886 

.0024616 

2.75 

.4970202 

.0090936 

3.20 

.4993129 

.0023841 

2.76 

.4971099 

.0088466 

8.21 

.4993363 

.0023080 

2.77 

.4971972 

.0086062 

3.22 

.4993590 

.0022368 

2.78 

.4972821 

.0083697 

3.23 

.4993810 

.0021649 

2.79 

.4973646 

.0081398 

3.24 

.4994024 

.0020960 

2.80 

.4974449 

.0079166 

8.26 

.4994230 

.0020290 

2.81 

.4976229 

.0076966 

8.26 

.4994429 

.0019641 

2.82 

.4976988 

.0074829 

8.27 

.4994623 

.0019010 

2.83 

.4976726 

.0072744 

8.28 

.4994810 

.0018397 

2.84 

.4977443 

.0070711 

8.29 

.4994991 

.0017803 

2.86 

.4978140 

.0068728 

3.30 

.4996166 

.0017226 

2.86 

.4978818 

.0066793 

8.81 

.4995335 

.0016666 

2.87 

.4979476 

.0064907 

8.32 

.4996499 

.0016122 

2.88 

.4980116 

.0063067 

8.33 

.4996668 

.0016696 

2.89 

.4980738 

.0061274 

3.34 

.4996811 

.0016084 

2.90 

.4981342 

.0069625 

8.86 

.4996969 

.0014687 

2.91 

.4981929 

.0067821 

8.36 

.4996103 

.0014106 

2.92 

.4982498 

,0066160 

3.37 

.4996242 

.0013639 

2.93 

.4983062 

.0064641 

8.38 

.4996376 

.0013187 

2.94 

.4983689 

.0062963 

8.39 

.4996606 

.0012748 

2.96 

.4984111 

.0061426 

3.40 

.4996631 

.0012322 

2.96 

.4984618 

.0049929 

8.41 

.4996762 

.0011910 

2.97 

.4986110 

.0048470 

8.42 

.4996869 , 

.0011610 

2.98 

.4986688 

.0047060 

3.43 

.4996982 

.0011122 

2.99 

.4986061 

.0046666 

8.44 

.4997091 

.0010747 

8.00 

.4986601 

.0044318 

3.45 

.4997197 

.0010383 

8.01 

.4986938 

.0043007 

8.46 

.4997299 

.0010030 

8.02 

.4987361 

.0041729 

8.47 

.4997398 

.0009689 

8.03 

.4987772 

.0040486 

8.48 

.4997493 

.0009368 

8.04 

.4988171 

.0039276 

8.49 

.4997686 

.0009037 

8.06 

.4988668 

.0038098 

8.60 

.4997674 

.0008727 

8.06 

.4988933 

.0036961 

8.61 

.4997769 

.0008426 

8.07 

.4989297 

.0036836 

8.62 

.4997842 

.0008136 

8.08 

.4989660 

.0034761 

8.63 

.4997922 

.0007863 

8.09 

.4989992 

.0033696 

8.64 

.4997999 

.0007681 

8.10 

.4990324 

.0032668 

8.55 

.4998074 

.0007317 

8.11 

.4900646 

.0031669 

8.66 

.4998146 

.0007061 

8.12 

.4990967 

.0030698 

8.67 

.4998216 

.0006814 

8.18 

.4991260 

.0029764 

8.68 

.4998282 

.0006676 

8.14 

.4991663 

.0028836 

8.69 

.4998347 

.0006343 

8.16 

.4991836 

.0027943 

8.60 

.4098409 

.0006119 


iViom Kent, **Th« Ea«mento of Statiitioo." 










APPENDIX 


303 


Table 1.—Abba and Ordinate of the Normal Curve. *— {Continued) 


x/9 

Area 

Ordinate (y) 

x/v 

Area 

Ordinate (y) 

3.61 

.4998469 

.0006902 

4.06 

.4999766 

.0001061 

3.62 

.4998627 

.0006693 

4.07 

.4999766 

.0001009 

3.63 

.4998683 

.0006490 

4.08 

.4999775 

.0000969 

3.64 

.4998637 

.0006294 

4.09 

.4999784 

.0000930 

3.66 

.4998689 

.0006106 

4.10 

.4999793 

.0000893 

3.66 

.4998739 

.0004921 

4.11 

.4999802 

.0000867 

3.67 

.4998787 

.0004744 

4.12 

.4999811 

.0000822 

3.68 

.4998834 

.0004673 

4.13 

.4999819 

.0000789 

3.66 

.4998879 

.0004408 

4.14 

.4999826 

.0000767 

3.70 

.4998922 

.0004248 

4.16 

.4999834 

.0000726 

3.71 

.4998964 

.0004093 

4.16 

.4999841 

.0000697 

3.72 

.4999004 

.0003800 

4.17 

.4999848 

.0000668 

3.73 

.4999043 

.0003661 

4.18 

.4999864 

.0000641 

3.74 

.4999080 

.0003626 

4.19 

.4999861 

.0000616 

3.76 

.4999116 

.0003886 

4.20 

.4999867 

.0000689 

3.76 

.4999160 

.0003396 

4.21 

.4999872 

.0000666 

3.77 

.4999184 

.0003271 

4.22 

.4999878 

.0000642 

3.78 

.4999216 

.0003149 

4.23 

.4999883 

.0000510 

3.79 

.4999247 

.0003032 

4.24 

.4999888 

.0000498 

3.80 

.4999277 

.0002919 

4.26 

.4999893 

.0000477 

3.81 

.4999306 

.0002810 

4.26 

.4999898 

.0000467 

3.82 

.4999333 

.0002706 

4.27 

.4999902 

.0000438 

3.83 

.4999369 

.0002604 

4.28 

.4999907 

.0000420 

3.84 

.4999385 

.0002606 

4.29 

.4999911 

.0000402 

3.86 

.4999409 

.0002411 

4.30 

.4999916 

.0000386 

3.86 

.4999433 

.0002320 

4.31 

.4999918 

.0000369 

8.87 

.4999466 

.0002232 

4.32 

.4999922 

.0000364 

3.88 

.4999478 

.0002147 

4.33 

.4999926 

.0000339 

8.89 

.4999499 

.0002065 

4.34 

.4999929 

.0000324 

3.90 

.4999519 

.0001987 

4.36 

.4999932 

.0000310 

3.91 

.4999639 

.0001910 

4.36 

.4999935 

.0000297 

3.92 

.4999667 

.0001837 

4.37 

.4099938 

.0000284 

3.93 

.4999675 

.0001766 

4.38 

.4999941 

.0000272 

3.94 

.4999693 

.0001698 

4.39 

.4999943 

.0000261 

3.96 

.4999609 

.0001633 

4.40 

.4999946 

.0000249 

3.96 

.4999626 

.0001569 

4.41 

.4999948 

.0000239 

3.97 

.4999641 

.0001608 

4.42 

.4999961 

.0000228 

3.98 

.4999656 

.0001449 

4.43 

.4999963 

.0000218 

8.99 

.4999670 

.0001393 

4.44 

.4999966 

.0000209 

4.00 

.4999683 

.0001338 

4.46 

.4999967 

.0000200 

4.01 c 

.4999696 

.0001286 

4.46 

.4999969 

.0000191 

4.02 

.4999709 

.0001236 

4.47 

.4999961 

.0000183 

4.03 

.4999721 

.0001186 

4.48 

.4999963 

.0000176 

4.04 

.4999733 

.0001140 

4.49 

.4999964 

.0000167 

4.06 

.4999744 

.0001094 

4.60 

.4999966 

.0000160 


^ From Kent, ** The Elements of Stntistios.?, 
















ELEMENTARY SOCIAL STATISTICS 


o 

6.635 

9.210 

11.341 

13.277 

15.086 

16.812 

18.475 

20.090 

21.666 

23.209 

24.725 

26.217 

27.688 

29.141 

30.578 

32.000 

33.409 

34.805 

36.191 

37.566 

38.932 

40.289 

41.638 

42.980 

44.314 

45.642 

46.963 

48.278 

49.588 

50.892 

O 

5.412 

7.824 

9.837 

11.668 

13.388 

15.033 

16.622 

18.168 

19.679 

21.161 

22.618 

24.054 

25.472 

26.873 

28.259 

29.633 

30.995 

32.346 

33.687 

35.020 

36.343 

37.659 

38.968 

40.270 

41.566 

42.856 

44.140 

45.419 

46.693 

47.962 

«o 

o 

3.841 

5.991 

7.815 

9.488 

11.070 

12.592 

14.067 

15.507 

16.919 

18.307 

19.675 

21.026 

22.362 

23.685 

24.996 

26.296 

27.587 

28.869 

30.144 

31.410 

32.671 

33.924 

35.172 

36.415 

37.652 

38.885 

40.113 

41.337 

42.557 

43.773 

o 

2.706 

4.605 

6.251 

7.779 

9.236 

10.645 

12.017 

13.362 

14.684 

15.987 

17.275 

18.549 

19.812 
21.064 
22.307 

23.542 

24.769 

25.989 

27.204 

28.412 

29.615 

30.813 
32.007 
33.196 

34.382 

35.563 

36.741 

37.916 

39.087 

40.256 

.20 

1.642 
3.219 

4.642 
5.989 
7.289 

8.558 

9.803 

11.030 

12.242 

13.442 

14.631 

15.812 

16.985 

18.151 

19.311 

20.465 

21.615 

22.760 

23.900 

25.038 

26.171 

27.301 

28.429 

29.553 

30.675 

31.795 

32.912 

34.027 

35.139 

36.250 

.30 

ON«-<0\c<^r4 00 . 1 —OOOsoo'^f>< 

t'.pvO^-'O tooof'itnoo 

O'^vOQOO ooO\00»-< 

t^ooCTvO»-H ooo\d«-ic^ «^i^«vor'*QO 

es«Mcs<s«s 

.50 

tONOvOr>**»-< *-iOOO\0\ OOOOOOOOt>«- vonOSOVOVO 

^ 

t^vot^ood mvOt^QOOs lovot^ooov 

<S<S(S<MC4 «S<*^C>*CMC^ 

.70 

oo-^NO*-'^ 

00O<M'^'O 

oovo^ot^cs »-<oo\oot^ i-ii-tocy\oo 

OOOnOnO*-* tN.ooONO\0 •MfSPO'^to 

cscscsr^cs 

o 

QO 

.0642 

.446 

1.005 

1.649 

2.343 

3.070 

3.822 

4.594 

5.380 

6.179 

6.989 

7.807 

8.634 

9.467 

10.307 

11.152 

12.002 

12.857 

13.716 

14.578 

15.445 

16.314 

17.187 

18.062 

18.940 

19.820 

20.703 

21.588 

22.475 

23.364 

.90 

.0158 

.211 

.584 

1.064 

1.610 

2.204 
2.833 
3.490 
4.168 
• 4.865 

5.578 

6.304 

7.042 

7.790 

8.547 

9.312 

10.085 

10.865 

11.651 

12.443 

13.240 

14.041 

14.848 

15.659 

16.473 

17.292 

18.114 

18.939 

19.768 

20.599 

.95 

.00393 

.103 

.352 

.711 

1.145 

1.635 

2.167 

2.733 

3.325 

3.940 

4.575 

5.226 

5.892 

6.571 

7.261 

7.962 

8.672 

9.390 

10.117 

10.851 

11.591 

12.338 

13.091 

13.848 

14.611 

15.379 

16.151 

16.928 

17.708 

18.493 

.98 

.000628 

.0404 

.185 

.429 

.752 

1.134 

1.564 

2.032 

2.532 

3.059 

3.609 

4.178 

4.765 

5.368 

5.985 

6.614 

7.255 

7.906 

8.567 

9.237 

9.915 

10.600 

11.293 

11.992 

12.697 

13.409 

14.125 

14.847 

15.574 

16.306 

1 

r«. 

10 

r4ON>ne0QO «soov>t<>Q r».fSNONO'^ ooomovo«o 

r^.t^'^oo^o »nt*«0'0P>» OM^\Ovri»o 

OOC4vOO^ OO-^OVOC-* 00 ^ 0*^0010 i-400>or>40\ 

«M».4c«ir4 «oco'^'e*io iovot««r»oo ooo^OO*-< 


\Dt^COO\G I-Mcsiro-^IO ^0^^000\0 VOt^oOO^O 


For larger values of n, the expression v^2x* — ■V^2n — 1 may be used as a normal deviate with unit standard deviation. 

* This table is taken by consent from Statutical Melhodt for Research Worker* by Prof. R. A. Fisher, by Oliver & Boyd, Edinburgh, and attention is 
drawn to the larger collection in Statietical Tablet by Prof. R. A. Fisher and F. Yates, by Oliver A Boyd, Edinburgh. 













































APPENDIX 


305 


Tablb 3.—^Binomial Coefficients, 



* From Davis and Nklson, Elements of Statistics, p. 31. By permission of the Cowles 
Commission for Research in Economics, Chicago. For a larger table, see T. C. Fry, Pro6-> 
ability and Its Engineering Uses, pp. 439-452. 


















306 


ELEMENTARV SOCIAL STATISTICS 


Tablb 4.—^Values of the Corbblation Coefficient fob Diffbbbnt 
Levels of Significance’* 


•\ 

.05 

.01 

1 

.996917 

.9998766 

2 

.95000 

.990000 

3 

.8783 

.95873 

4 

.8114 

.91720 

5 

.7545 

.8745 

6 

.7067 

.8343 

7 

.6664 

.7977 

8 

.6319 

.7646 

9 

.6021 

.7348 

10 

.5760 

.7079 

11 

.5529 

.6835 

12 

.5324 

.6614 

13 

.5139 

.6411 

14 

.4973 

.6226 

16 

.4821 

.6055 

16 

.4683 

.5897 

17 

.4555 

.5751 

18 

.4438 

.5614 

19 

.4329 

.5487 

20 

.4227 

.5368 

25 

.3809 

.4869 

30 

1 .3494 

.4487 

35 

.3246 

.4182 

40 

.3044 

.3932 

45 

.2875 

.3721 

60 

.2732 

.3541 

60 

.2500 

.3248 

70 

.2319 

.3017 

80 

.2172 

.2830 

90 

.2050 

.2673 

100 

.1946 

.2540 


For a total correlation, n is 2 leas than the number of pairs in the sample; for a partial 
correlation, the number of eliminated variates also should be subtraoted. 

This table is taken by consent from Statiwtical Methodt for Roteareh Workero by Prof. 
. A. Fisher, by Oliver A Boyd, Edinburgh, and attention is drawn to the larger collection 
Statittieal TahUt by Prof. R. A. Fisher and F. Yates, by Oliver A Boyd. Edinburgh 






APPENDIX 


307 


Table 5.—Values of z for Given Values of r* 


r 

.000 

.001 

.002 

.003 

.004 

.006 

.006 

.007 

.008 

.009 

.000 

.0000 

.0010 

.0020 

.0030 

.0040 

.0060 

.0060 

.0070 

.0080 

.0090 

•010 

.0100 

.0110 

.0120 

.0130 

.0140 

.0160 

.0160 

.0170 

.0180 

.0190 

wm 

.0200 

.0210 

.0220 

.0230 

.0240 

.0260 

.0260 

.0270 

.0280 

.0290 

.030 

.0300 

.0310 

.0320 

.0330 

.0340 

.0350 

.0360 

.0370 

.0380 

.0390 


.0400 

.0410 

.0420 

.0430 

.0440 

.0460 

.0460 

.0470 

.0480 

.0490 


.0601 

.0611 

.0621 

.0631 

.0541 

.0561 

.0661 

.0671 

.0581 

.0691 

.060 

.0601 

.0611 

.0621 

.0631 

.0641 

.0661 

.0661 

.0671 

.0681 

.0691 

.070 

.0701 

.0711 

.0721 

.0731 

.0741 

.0761 

.0761 

.0771 

.0782 

.0792 

.080 

.0802 

.0812 

.0822 

.0832 

.0842 

.0852 

.0862 

.0872 

.0882 

.0892 

.000 


.0912 

.0922 

.0933 

.0943 

.0953 

.0963 

.0973 

.0983 

.0993 

■in 


.1013 

.1024 

.1034 

.1044 

.1054 

.1064 

.1074 

.1084 

.1094 

.110 

.1106 

.1116 

.1126 

.1136 

.1146 

.1165 

.1166 

.1176 

.1186 

.1196 

.120 


.1216 

.1226 

.1236 

.1246 

.1267 

.1267 

.1277 

.1287 

.1297 

.130 

.1308 

.1318 

.1328 

.1338 

.1348 

.1368 

.1368 

.1379 

.1389 

.1399 

.140 


.1419 

.1430 

.1440 

.1460 

.1460 

.1470 

.1481 

.1491 

.1601 

.160 

.1611 

.1622 

.1632 

.1642 

.1662 

.1663 

.1673 

.1583 

.1593 

.1604 

.160 

.1614 

.1624 

.1634 

.1664 

.1665 

.1666 

.1676 

.1686 

.1696 

.1706 

.170 

.1717 

.1727 

.1737 

.1748 

.1768 

.1768 

.1779 

.1789 

.1799 

.1810 

.180 

.1820 

.1830 

.1841 

.1861 

.1861 

.1872 

.1882 

.1892 

.1903 

.1913 

.190 

.1923 

.1934 

.1944 

.1964 

.1966 

.1976 

.1986 

.1996 

.2007 

.2017 

200 

.2027 

.2038 

.2048 

.2069 

.2069 

.2079 

.2090 

.2100 

.2141 

.2121 

210 

.2132 

.2142 

.2163 

.2163 

.2174 

.2184 

.2194 

.2206 

.2216 

.2226 

220 

.2237 

.2247 

.2268 

.2268 

.2279 

.2289 

.2300 

.2310 

.2321 

.2331 

230 

.2342 

.2363 

.2363 

.2374 

.2384 

.2395 

.2406 

.2416 

.2427 

.2437 

240 

.2448 

.2468 

.2469 

.2480 

.2490 

.2601 

.2611 

.2522 

.2533 

.2643 

.260 

.2664 

.2666 

.2675 

.2686 

.2697 

.2608 

.2618 

.2629 

.2640 

.2660 

.260 

.2661 

.2672 

.2682 

.2693 

.2704 

.2716 

.2726 

.2736 

.2747 

.2768 

.370 

.2769 

.2779 

.2790 

.2801 

.2812 

.2823 

.2833 

.2844 

.2856 

.2866 

.280 

.2877 

.2888 

.2898 

.2909 

.2920 

.2931 

.2942 

.2963 

.2964 

.2976 

.290 

.2986 

.2997 

.3008 

.3019 

.3029 

.3040 

.3051 

.3062 

.3073 

.3084 


.3096 

.3106 

.3117 

.3128 

.3139 

.3150 

.3161 

.3172 

.3183 

.3196 

.310 

.3206 

.3217 

.3228 

.3239 

.3250 

.3261 

.3272 

.3283 

.3294 

.3306 

.320 

.3317 

.3328 

.3339 

.3360 

.3361 

.3372 

.3384 

.3395 

.3406 

.3417 

.330 

.3428 

.3439 

.3461 

.3462 

.3473 

.3484 

.3496 

.3507 

.3518 

.3630 

.340 

.3641 

.3662 

.3664 

.3676 

.3686 

.3697 

.3609 

.3620 

.3632 

.3643 

.360 

.3664 

.3666 

.3677 

.3689 

.3700 

.3712 

.3723 

.3734 

.3746 

.3767 

.360 

.3769 

.3780 

.3792 

.3803 

.3816 

.3826 

.3838 

.3860 

.3861 

.3873 

.370 

.3884 

.3896 

.3907 

.3919 

.3931 

.3942 

.3964 

.3966 

.3977 

.3989 

.380 

.4001 

.4012 

.4024 

.4036 

.4047 

.4069 

.4071 

.4083 

.4094 

.4106 

.390 

.4118 

.4130 

.4142 

.4153 

.4166 

.4177 

.4189 

.4201 

.4213 

.4225 

.400 

.4236 

.4248 

.4260 

.4272 

.4284 

.4296 

.4308 

.4320 

.4332 

.4344 

.410 

.4366 

.4368 

.4380 

.4392 

.4404 

.4416 

.4429 

.4441 

.4453 

.4466 

.420 

.4477 

.4489 

.4601 

.4613 

.4626 

.4638 

.4660 

.4662 

.4674 

.4687 

.430 

.4699 

.4611 

.4623 

.4636 

.4648 

.4660 

.4673 

.4686 

.4697 

.4710 

.440 

.4722 

.4736 

.4747 

.4760 

.4772 

.4784 

.4797 

.4809 

.4822 

.4836 

.460 

.4847 

.4860 

.4872 

.4886 

.4897 

.4910 

.4923 

.4036 

.4948 

.4961 

.460 

.4973 

.4986 

.4999 

.6011 

.5024 

.6037 

.6049 

.6062 

.6076 

.6088 

.470 

.6101 

.6114 

.6126 

.6139 

.6162 

.6166 

.6178 

.6191 

.6204 

.6217 

.480 

.6230 

.6243 

.6266 

.6279 

.6282 

.6296 

.6308 

.6321 

.6334 

.6347 

.490 

.6361 

.6374 

.5387 

.6400 

1 

.6413 

.6427 

.6440 

.6463 

.6466 

.6480 


* From Albert E. Waugh, LaboratoryManual and Problems for Elements of Statistical 
Method, pp. 32-33, McOraw-Hill Book Company, Ino., New York. 







































308 


ELEMENTARY SOCIAL STATISTICS 


Tabus 6.—Values of z fob Given Values of r. — (Continued) 




















































































APPENDIX 


809 


Foreword to Table 6.—^To extract the square root of any number, we 
begin at the decimal point and group the figures by pairs in both directions. 

'For example, 7,600,000,000,000 becomes 07 50 00 00 00 00 00. In 
Table 6 we look up the figure, 750. Its square root is seen to be 27.3861. 
We allow one figure in the root for each pair of figures in the number. There 
are seven pairs to the left of the decimal in our number and none to the right 
of the decimal, so the root will contain seven figures to the left of the decimal, 
thus: 2,738,610. In looking up a square root, never separate the figures in 
a pair. In our illustration it would be wrong to find the square root of the 
number 75 or of the number 7500. 

When the square root of a large number (e.^., 7,583,615,000,000) cannot 
be found exactly from the table, the nearest approximation is often taken 
take the square root of 7,580,000,000,000 as roughly equivalent to the 
square root of 7,583,615,000,000). Where greater accuracy is required, a 
larger table may be used (see Barlow's Tables of Squares, Cubes, Square 
Roots, Cube Roots, Reciprocals of All Integer Numbers Up to 10,000, Spon and 
Chamberlain, 120 Liberty Street, New York), or any elementary textbook in 
algebra may be consulted for the method of extracting a square root. Cal¬ 
culating machine companies furnish pamphlets describing how to extract 
a square root on their machines. A slide rule gives approximate square 
roots easily and rapidly. 



310 


ELEMENTARY SOCIAL STATISTICS 


Tablb 6.—Squarbs and Squarb Roots* 


Number 

Square 

Square root 

Number 

Square 

Square root 

1 


1.0000 

41 

1681 

6.4031 

2 


1.4142 

42 

1764 

6.4807 

3 

» 

1.7321 

43 

1849 

6.5574 

4 


2.0000 

44 

1936 

6.6332 

5 


2.2361 

45 

2025 

6.7082 

6 


2.4495 

46 

2116 

6.7823 

7 

49 

2.6458 

47 

2209 

6.8557 

8 

64 

2.8284 

48 

2304 

6.9282 

9 

81 

3.0000 

49 

2401 

7.0000 

10 

100 

3.1623 

60 

2500 

7.0711 

11 

121 

3.3166 


2601 

7.1414 

12 

144 

3.4641 

' 52 

2704 

7.2111 

13 

169 

3.6056 

53 

2809 

7.2801 

14 

196 

3.7417 

54 

2916 


15 

225 

3.8730 

55 

3025 


16 

256 

4.0000 

56 

3136 


17 

289 

4.1231 

57 

3249 


18 

324 

4.2426 

58 

3364 


19 

361 

4.3689 

59 

3481 

7.6811 

20 

400 

4.4721 

60 

3600 

7.7460 

21 

441 

4.5826 

61 

3721 

7.8102 

22 

484 

4.6904 

62 

3844 

7.8740 

23 

529 

4.7958 

63 

3969 

7.9373 

24 

576 

4.8990 

64 

4096 

8.0000 

25 

625 

5.0000 

65 

4225 

8.0623 

26 

676 

5.0990 

66 

4356 

8.1240 

27 

729 

5.1962 

67 

4489 

8.1854 

28 

784 

5.2915 

68 

4624 

8.2462 

29 

841 

5.3852 

69 

4761 

8.3066 

30 

900 

5.4772 

70 

4900 

8.3666 

31 

961 

5.5678 

71 

5041 

8.4261 

32 

1024 

5.6569 

72 

5184 

8.4853 

33 

1089 

5.7446 

73 

5329 

8.5440 

34 

1156 

5.8310 

74 

5476 

8.6023 

35 

1225 

5.9161 

75 

5625 

8.6603 

36 

1296 

6.0000 

76 

5776 

8.7178 

37 

1369 

6.0828 

77 

5929 

8.7750 

38 

1444 

6.1644 

78 

6084 

8.8318 

39 

1521 

6.2450 

79 

6241 

8.8882 

40 

1600 

6.3246 

80 

6400 

8.9443 


* From Herbert Sorenson, Statutica for Students of Payehology and Education, pp. 347-369, 
MeOraw'Hill Book Company, Ino., New York 












APPENDIX 


311 


Table 6.—Squares and Square Roots.— {Continued) / 


Number 

Square 

Square root 


Square 

Square root 

81 

6561 

9.0000 


14641 

11.0000 

82 

6724 

9.0554 


14884 

11.0454 

83 

6889 

9.1104 


15129 

11.0905 

84 

7056 

9.1652 

124 

15376 

11.1355 

86 

7225 

9.2195 

125 

15625 

11.1803 

86 

7396 

9.2736 

126 

15876 

11.2250 

87 

7569 

9.3274 

127 

16129 

11.2694 

88 

7744 

9.3808 

128 

16384 

11.3137 

89 

7921 

9.4340 

129 

16641 

11.3578 

90 

8100 

9.4868 

130 

16900 

11.4018 

91 

8281 

9.5394 

131 

17161 

11.4455 

92 

8464 

9.5917 

132 

17424 

11.4891 

93 

8649 

9.6437 

133 

17689 

11.5326 

94 

8836 

9.6954 

134 

17956 

11.5758 

95 

9025 

9.7468 

135 

18225 

11.6190 

96 

9216 

9.7980 

136 

18496 

11.6619 

97 

9409 

9.8489 

137 

18769 

11.7047 

98 

9604 

9.8995 

138 

19044 

11.7473 

99 

9801 

9.9499 

139 

19321 

11.7898 

100 


10.0000 

140 

19600 

11.8322 

101 

10201 

10.0499 

141 

19881 

11.8743 

102 


10.0995 

142 

20164 

11.9164 

103 

10609 

10.1489 

143 

20449 

11.9583 

104 

10816 

10.1980 

144 

20736 

12.0000 

105 

11025 

10.2470 

145 

21025 

12.0416 

106 

11236 

10.2956 

146 

21316 

12.0830 

107 

11449 

1 10.3441 

147 

21609 

12.1244 

108 

11664 

10.3923 

148 

21904 

12.1655 

109 

11881 

10.4403 

149 

22201 

12.2066 

no 

12100 

10.4881 

150 

22500 

12.2474 

111 

12321 

10.5357 

151 

22801 

12.2882 

112 


10.5830 

152 

23104 

12.3288 

113 

■19 

10.6301 

153 

23409 

12.3693 

114 

mmm 

10.6771 

154 

23716 

12.4097 

115 

Wlmt 

10.7238 

155 

24025 

12.4499 

116 

WSM 

10.7703 

156 

24336 

12.4900 

117 

■Bl 

10.8167 

157 

24649 

12.5300 

118 


10.8628 

158 

24964 

12.5698 

119 

■igl 

10.9087 

159 

25281 

12.6095 

120 

14400 

10.9545 

1 

160 

25600 

12.6491 
















312 


ELEMENTARY SOCIAL STATISTICS 


Tablb 6.—Squares and Square Roots. — {Continued) 


Niimber 

Square 

161 

25921 

162 

26244 

163 

26569 

164 

26896 

165 

27225 

166 

27556 

167 

27889 

168 

28224 

169 

28561 

170 

28900 

171 

29241 

172 

29584 

173 

29929 

174 

30276 

175 

30625 

176 

30976 

177 

31329 

178 

31684 

179 

32041 

180 

32400 

181 

32761 

182 

33124 

183 

33489 

184 

33856 

185 

34225 

186 

34596 

187 

34969 

188 

35344 

189 

35721 

190 

36100 

191 

36481 

192 

36864 

193 

37249 

194 

37636 

195 

38025 

196 

38416 

197 

38809 

198 

39204 

199 

39601 


40000 


12.6886 

12.7279 

12.7671 

12.8062 

12.8452 

12.8841 

12.9228 

12.9615 

13.0000 

13.0384 


13.0767 

13.1149 

13.1529 

13.1909 

13.2288 

13.2665 

13.3041 

13.3417 

13.3791 

13.4164 

13.4536 

13.4907 

13.5277 

13.5647 

13.6015 

13.6382 

13.6748 

13.7113 

13.7477 

13.7840 


13.8203 

13.8564 

13.8924 

13.9284 

13.9642 

14.0000 

14.0357 

14.0712 

14.1067 

14.1421 


201 

202 

203 

204 

205 

206 

207 

208 

209 

210 

211 

212 

213 

214 

215 

216 

217 

218 

219 

220 

221 

222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

240 


Square 

Square root 

40401 

14.1774 

40804 

14.2127 

41209 

14.2478 

41616 

14.2829 

42025 

14.3178 

42436 

14.3527 

42849 

14.3875 

43264 

14.4222 

43681 

14.4568 

44100 

14.4914 

44521 

14.5258 

44944 

14.5602 

45369 

14.5945 

45796 

14.6287 

46225 

14.6629 

46656 

14.6969 

47089 

14.7309 

47524 

14.7648 

47961 

14.7986 

48400 

14.8324 

48841 

14.8661 

49284 

14.8997 

49729 

14.9332 

50176 

14.9666 

50625 

15.0000 

51076 

15.0333 

51529 

15.0666 

51984 

15.0997 

52441 

15.1327 

52900 

15.1658 

53361 

15.1987 

53824 

15.2315 

54289 

15.2643 

54756 

15.2971 

55225 

15.3297 

55696 

15.3623 

56169 

15.3948 

56644 

16.4272 

57121 

15.4596 

57600 

15.4919 







APPENDIX 


313 


Table 6.—Squares and Square Roots.— {Continued) 


Number Square 



15.6242 

15.5563 

15.5885 

15.6205 

16.6525 

15.6844 

15.7162 

15.7480 

15.7797 

15.8114 

15.8430 

15.8745 

15.9060 

15.9374 

15.9687 

16.0000 

16.0312 

16.0624 

16.0935 

16.1245 

16.1655 

16.1864 

16.2173 

16.2481 

16.2788 

16.3095 

16.3401 

16.3707 

16.4012 

16.4317 

16.4621 

16.4924 

16.6227 

16.5529 

16.6831 

16.6132 

16.6433 

16.6733 

16.7033 

16.7332 


Square Square root 


16.7631 

16.7929 

16.8226 

16.8523 

16.8819 

16.9115 

16.9411 

16.9706 

17.0000 

17.0294 

17.0687 

17.0880 

17.1172 

17.1464 

17.1756 

17.2047 

17.2337 

17.2627 

17.2916 

17.3206 

17.3494 

17.3781 

17.4069 

17.4366 

17.4642 

17.4929 

17.5214 

17.6499 

17.5784 

17.6068 


96721 

97344 

97969 

98596 

99225 

99856 

100489 

101124 

101761 

102400 


17.6362 

17.6636 

17.6918 

17.7200 

17.7482 

17.7764 

17.8045 

17.8326 

17.8606 

17.8885 









314 


ELEMENTARY SOCIAL STATISTICS 


Table 6. — Squabes and Square Roots.— (Continued) 


Number 

' 

Square 

321 

103041 

322 

103684 

323 

104329 

324 

104976 

325 

105625 

326 

106276 

327 

106929 

828 

107684 

329 

108241 

330 

108900 

331 

109561 

332 

110224 

333 

110889 

334 

111666 

335 

112225 

336 

112896 

337 

113569 

338 

114244 

339 

114921 

340 

116600 

341 

116281 

342 

116964 

343 

117649 

344 

118336 

345 

119025 

346 

119716 

347 

120409 

348 

121104 

349 

121801 

350 

122500 

351 

123201 

352 

123904 

353 

124609 

354 

125316 

355 

126025 

356 

126736 

367 

127449 

358 

128164 

359 

128881 

360 

129600 


Square root 


17.9166 

361 

17.9444 

362 

17.9722 

363 

18.0000 

364 

18.0278 

365 

18.0555 

366 

18.0831 

367 

18.1108 

368 

18.1384 

369 

18.1669 

370 

18.1934 

371 

18.2209 

372 

18.2483 

373 

18.2767 

374 

18.3030 

375 

18.3303 

376 

18.3676 

377 

18.3848 

378 

18.4120 

379 

18.4391 

380 

18.4662 

381 

18.4932 

382 

18.5203 

383 

18.6472 

384 

18.6742 

385 

18.6011 

386 

18.6279 

387 

18.6548 

388 j 

18.6815 

389 

18.7083 

390 

18.7360 

391 

18.7617 

392 

18.7883 

393 

18.8149 

394 

18.8414 

395 

18.8680 

396 

18.8944 

397 

18.9209 

398 

18.9473 

399 

18.9737 

400 


Square 

Square root 

130321 

19.0000 

131044 

19.0263 

131769 

19.0626 

132496 

19.0788 

133226 

19.1060 

133966 

19.1311 

134689 

19.1672 

136424 

19.1833 

136161 

19.2094 

136900 

19.2354 

137641 

19.2614 

138384 

19.2873 

139129 

19.3132 

139876 

19.3391 

140625 

19.3649 

141376 

19.3907 

142129 

19.4165 

142884 

19.4422 

143641 

19.4679 

144400 

19.4936 

146161 

49.5192 

145924 

19.6448 

146689 

19.6704 

147456 

19.6969 

148225 

19.6214 

148996 

19.6469 

149769 

19.6723 

150544 

19.6977 

151321 

19.7231 

152100 

19.7484 

152881 

19.7737 

153664 

19.7990 

154449 

19.8242 

155236 

19.8494 

156025 

19.8746 

166816 

, 19.8997 

167609 

19.9249 

158404 

19.9499 

169201 

19.9760 

160000 

20.0000 
















APPENDIX 


315 


Table 6.—Squares and Square Roots.— (ContiniLed) 


Number 

Square 

Square root 

Number 

Square 

Square root 

401 

160801 

20.0250 

441 

194481 

21.0000 

402 

161604 

20.0499 

442 

195364 

21.0238 

403 

162409 

20.0749 

443 

196249 

21.0476 

404 

163216 

20.0998 

444 

197136 

21.0713 

405 

164025 

20.1246 

445 

198025 

21.0950 

406 

164836 

20.1494 

446 

198916 

21.1187 

407 

165649 

20.1742 

447 

199809 

21.1424 

408 

166464 

20.1990 

448 

200704 

21.1660 

409 

167281 

20.2237 

449 

201601 

21.1896 

410 

168100 

20.2485 

450 

202500 

21.2132 

411 

168921 

20.2731 

451 

203401 

21.2368 

412 

169744 

20.2978 

452 

204304 

21.2603 

413 

170569 

20.3224 

453 

205209 

21.2838 

414 

171396 

20.3470 

454 

206116 

21.3073 

415 

172225 

20.3715 

455 

207025 

21.3307 

416 

173056 

20.3961 

456 

207936 

21.3542 

417 

173889 

20.4206 

457 

208849 

21.3776 

418 

174724 

20.4450 

458 

209764 

21.4009 

419 

175561 

20.4695 

459 

210681 

21.4243 

420 

176400 

20.4939 

460 

211600 

21.4476 

421 

177241 

20.5183 

461 

212521 

21.4709 

422 

178084 

20.5426 

462 

213444 

21.4942 

423 

178929 

20.5670 

463 

214369 

21.5174 

424 

179776 

20.5913 

464 

215296 

21.5407 

425 

180625 

20.6155 

465 

216225 

21.5639 

426 

181476 

20.6398 

466 

217156 

21.5870 

427 

182329 

20.6640 

467 

218089 

21.6102 

428 

183184 

20.6882 

468 

219024 

21.6333 

429 

184041 

20.7123 

469 

219961 

21.6564 

430 

184900 

20.7364 

470 

220900 

21.6795 

431 

185761 

20.7605 

471 

221841 

21.7025 

432 

186624 

20.7846 

472 

222784 

21.7256 

433 

187489 

20.8087 

473 

223729 

21.7486 

434 

188356 

20.8327 

474 

224676 

21.7715 

435 

189225 

20.8567 

475 

225625 

21.7945 

436 

190096 

20.8806 

476 

226576 

21.8174 

437 

190969 

20.9045 

477 

227529 

21.8403 

438 

191844 

20.9284 

478 

228484 

21.8632 

439 

192721 

20.9523 

479 

229441 

21.8861 

440 

193600 

20.9762 

480 

230400 

21.9089 











316 


ELEMENTARY SOCIAL STATISTICS 


Tabui 6.r-SQTTABBS AND SquABB RooTS. — (Continued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

481 

231361 

21.9317 

521 

271441 

22.8254 

482 

232324 

21.9545 

522 

272484 

22.8473 

483 

233289 

21.9773 

523 

273529 

22.8692 

484 

234256 

22.0000 

524 

274576 

22.8910 

485 

235225 

22.0227 

525 

275625 

22.9129 

486 

236196 

22.0454 

526 

276676 

22.9347 

487 

237169 

22.0681 

527 

277729 

22.9565 

488 

238144 

22.0907 

528 

278784 

22.9783 

489 

239121 

22.1133 

529 

279841 

23.0000 

490 

240100 

22.1359 

530 

280900 

23.0217 

491 

241081 

22.1585 

531 

281961 

23.0434 

492 

242064 

22.1811 

532 

283024 

23.0651 

493 

243049 

22.2036 

533 

284089 

23.0868 

494 

244036 

22.2261 

534 

285156 

23.1084 

495 

245025 

22.2486 

535 

286225 

23.1301 

496 

246016 

22.2711 

536 

287296 

23.1517 

497 

247009 

22.2935 

537 

288369 

23.1733 

498 

248004 

22.3159 

538 

289444 

23.1948 

499 

249001 

22.3383 

539 

290521 

23.2164 

500 

250000 

22.3607 

540 

291600 

23.2379 

501 

251001 

22.3830 

541 

292681 

23.2594 

502 

252004 

22.4054 

542 

293764 

23.2809 

503 

253009 

22.4277 

543 

294849 

23.3024 

504 

254016 

22.4499 

544 

295936 

23.3238 

505 

255025 

22.4722 

545 

297025 

23.3452 

506 

256036 

22.4944 

546 

298116 

23.3666 

507 

257049 

22.5167 

547 

299209 

23.3880 

508 

258064 

22.5389 

548 

300304 

23.4094 

509 

259081 

22.5610 

549 

301401 

23.4307 

510 

260100 

22.5832 

550 

302500 

23.4521 

511 

261121 

22.6053 

551 

303601 

23.4734 

512 

262144 

22.6274 

552 

304704 

23.4947 

513 

263169 

22.6495 

553 

305809 

23.5160 

514 

264196 

22.6716 

554 

306916 

23.5372 

515 

265225 

22.6936 

555 

308025 

23.5584 

516 

266256 

22.7156 

556 

309136 

23.5797 

517 

267289 

22.7376 

557 

310249 

23.6008 

518 

268324 

22.7596 

558 

311364 

23.6220 

519 

269361 

22.7816 

559 

312481 

23.6432 

520 

270400 

22.8035 

560 

313600 

23.6643 











APPENDIX 


817 


Table 6.—^Squares and Square Roots.— {Continued) 


Number 

Square 

Square root 

561 

314721 

23.6854 

562 

315844 

23.7065 

563 

316969 

23.7276 

564 

318096 

23.7487 

565 

319225 

23.7697 

566 

320356 

23.7908 

567 

321489 

23.8118 

568 

322624 

23.8328 

569 

323761 

23.8537 

570 

324900 

23.8747 

571 

326041 

23.8956 

572 

327184 

23.9165 

573 

328329 

23.9374 

574 

329476 

23.9583 

575 

330625 

23.9792 

576 

331776 

24.0000 

577 

332929 

24.0208 

578 

334084 

24.0416 

579 

335241 

24.0624 

580 

336400 

24.0832 

581 

337561 

24,1039 

582 

338724 

24.1247 

583 

339889 

24.1454 

584 

341056 

24.1661 

585 

342225 

24.1868 

586 

343396 

24.2074 

587 

344569 

24.2281 

588 

345744 

24.2487 

589 

346921 

24.2693 

590 

348100 

24.2899 

591 

349281 

24.3105 

592 

350464 

24,3311 

593 

351649 

24.3516 

594 

352836 

24.3721 

595 

354025 

24.3926 

596 

355216 

24.4131 

597 

356409 

24.4336 

598 

357604 

24.4540 

599 

358801 

24.4745 

600 

360000 

24.4949 


Number 

Square 

Square root 

601 

361201 

24.5153 

602 

362404 

24.6367 

603 

363609 

24.5561 

604 

364816 

24.6764 

605 

366025 

24.6967 

606 

367236 

24.6171 

607 

368449 

24.6374 

608 

369664 

24.6677 

609 

370881 

24.6779 

610 

372100 

24.6982 

611 

373321 

24.7184 

612 

374544 

24.7386 

613 

375769 

24.7588 

614 

376996 

24.7790 

615 

378225 

24.7992 

616 

379456 

24.8193 

617 

380689 

24.8395 

618 

381924 

24.8596 

619 

383161 

24.8797 

620 

384400 

24.8998 

621 

385641 

24,9199 

622 

386884 

24,9399 

623 

388129 

24.9600 

624 

389376 

24.9800 

625 

390625 

25.0000 

626 

391876 

25.0200 

627 

393129 

25.0400 

628 

394384 

25.0599 

629 

395641 

25.0799 

630 

396900 

25.0998 

631 

398161 

26.1197 

632 

399424 

25.1396 

633 

400689 

25.1595 

634 

401956 

25.1794 

635 

403225 

25.1992 

636 

404496 

25.2190 

637 

406769 

25.2389 

638 

407044 

25.2587 

639 

408321 

25.2784 

640 

409600 

25.2982 









318 


ELEMENTARY SOCIAL STATISTICS 


Tablb 6.—Squares and Square Roots. — {Cmlinued) 


Number 

Square 

Square root 

Number 

Square 

Square root 

641 

410881 

26.3180 

681 

463761 

26.0960 

642 

412164 

25.3377 

682 

465124 

26.1151 

643 

413449 

25.3674 

683 

466489 

26.1343 

644 

414736 

26.3772 

684 

467856 

26.1534 

646 

416026 

25.3969 

686 

469225 

26.1726 

646 

417316 

26.4166 

686 

470596 

26.1916 

647 

418609 

26.4362 

687 

471969 

26.2107 

648 

419904 

26.4668 

688 

473344 

26.2298 

649 

421201 

26.4755 

689 

474721 

26.2488 

660 

422600 

25.4951 

690 

476100 

26.2679 

661 

423801 

25.6147 

691 

477481 

26.2869 

662 

426104 

25.5343 

692 

478864 

26.3059 

663 

426409 

25.5539 

693 

480249 

26.3249 

664 

427716 

25.6734 

694 

481636 

26.3439 

666 

429026 

25.6930 

696 

483026 

26.3629 

666 

430336 

25.6125 

696 

484416 

26.3818 

667 

431649 

25.6320 

697 

485809 

26.4008 

668 

432964 

26.6515 

698 

487204 

26.4197 

669 

434281 

25.6710 

699 

488601 

26.4386 

660 

436600 

26.6906 

700 

490000 

26.4576 

661 

436921 

26.7099 

701 

491401 

26.4764 

662 

438244 

25.7294 

702 

492804 

26.4963 

663 

439669 

26.7488 

703 

494209 

26.6141 

664 

440896 

26.7682 

704 

495616 

26.5330 

666 

442226 

25.7876 

705 

497025 

26.6518 

666 

443666 

25.8070 

706 

498436 

26.6707 

667 

444889 

25.8263 

707 

499849 

26.5896 

668 

446224 

25.8467 

708 

601264 

26.6083 

669 

447661 

25.8660 

709 

502681 

26.6271 

670 

448900 

25.8844 

710 

604100 

26.6458 

671 

460241 

26.9037 

711 

505621 

26.6646 

672 

461684 

26.9230 

712 

606944 

26.6833 

673 

462929 

26.9422 

713 

608369 

26.7021 

674 

464276 

25.9616 

714 

609796 

26.7208 

676 

466626 

26.9808 

716 

611226 

26.7395 

676 , 

466976 

26.0000 

716 

612656 

26.7682 

677 

458329 

26.0192 

717 

614089 

26.7769 

678 

469684 

26.0384 

718 

616624 

26.7966 

679 

461041 

26.0676 

719 

616961 

26.8142 


462400 

26.0768 

720 

618400 1 

26.8328 















APPENDIX 


319 


Table 6.—Squares and Square Roots.— {Continued) 


Number 




Square 

Square root 

721 

519841 

26.8514 

761 

679121 

27.5862 

722 

521284 

26.8701 

762 

580644 

27.6043 

723 

522729 

26.8887 

763 

582169 

27.6225 

724 

524176 

26.9072 

764 

683696 

27.6405 

725 

525625 

26.9258 

765 

686225 

27.6586 

726 

527076 

26.9444 

766 

586756 

27.6767 

727 

528529 

26.9629 

767 

588289 

27.6948 

" 728 

529984 

26.9815 

768 

. 589824 

27.7128 

729 

531441 

27.0000 

769 

591361 

27.7308 

730 

532900 

27.0185 

770 

592900 

27.7489 

731 

534361 

27.0370 

771 

594441 

27.7669 

732 

535824 

27.0555 

772 

595984 

27.7849 

733 

537289 

27.0740 

773 

597529 

27.8029 

734 

538756 

27.0924 

774 

599076 

27.8209 

735 

540225 

27.1109 

775 

600625 

27.8388 

736 

541696 

27.1293 

776 

602176 

27.8568 

737 

543169 

27.1477 

777 

603729 

27.8747 

738 

544644 

27.1662 

778 

605284 

27.8927 

739 

546121 

27.1846 

779 

606841 

27.9106 

740 

547600 

27.2029 

780 

608400 

27.9285 

741 

549081 

27.2213 

781 

609961 

27.9464 

742 

550564 

27.2397 

782 

611524 

27.9643 

743 

552049 

27.2580 

783 

613089 

27.9821 

744 

553536 

27.2764 

784 

614656 

28.0000 

745 

555025 

27.2947 

785 

616225 

28.0179 

746 

556516 

27.3130 

786 

617796 

28.0357 

747 

558009 

27.3313 

787 

619369 

28.0535 

748 

559504 

27.3496 

788 

620944 

28.0713 

749 

561001 

27.3679 

789 

622521 

28.0891 

750 

562500 

27.3861 

790 

624100 

28.1069 

751 

564001 

27.4044 

791 

625681 

28.1247 

752 

565504 

27.4226 

792 

627264 

28.1425 

753 

567009 

27.4408 

793 

628849 

28.1603 

754 

568516 

27.4591 

794 

630436 

28.1780 

755 

570026 

27.4773 

795 

632025 

28.1957 

756 

671536 

27.4955 

796 

633616 

28.2135 

757 ^ 

573049 

27.6136 

797 

635209 

28.2312 

758 

674564 

27.5318 

798 

636804 

28.2489 

750 

676081 

27.6600 

799 

638401 

28.2666 

760 

677600 

27.6681 

800 

640000 

28.2843 










320 


ELEMENTARY SOCIAL STATISTICS 


Tablb 6. — Squares and Square Roots. — {Continiied) 



Square 

641601 

643204 

644809 

646416 

648025 

649636 

651249 

652864 

654481 

656100 

657721 

659344 

660969 

662596 

664225 

665856 

667489 

669124 

670761 

672400 

674041 

675684 

677329 

678976 

680625 

682276 

683929 

685584 

687241 

688900 

690561 

692224 

693889 

695556 

697225 

698896 

700569 

702244 

703921 

705600 


Square root I Number 



Square 

Square root 

707281 

29.0000 

708964 

29.0172 

710649 

29.0345 

712336 

29.0517 

714025 

29.0689 

715716 

29.0861 

717409 

29.1033 

719104 

29.1204 

720801 

29.1376 

722500 

29.1548 

724201 

29.1719 

725904 

29.1890 

727609 

29.2062 

729316 ! 

29.2233 

731025 I 

29.2404 

732736 

29.2575 

734449 

29.2746 

736164 

29.2916 

737881 

29.3087 

739600 

29.3258 

741321 

29.3428 

743044 

29.3598 

744769 

29.3769 

746496 

29.3939 

748225 

29.4109 

749956 

29.4279 

751689 

29.4449 

753424 

29.4618 

755161 

29.4788 

756900 

29.4958 

758641 

29.5127 

760384 

29.5296 

762129 

29.5466 

763876 

29.5635 

765625 

29.5804 

767376 

29.5973 

769129 

29.6142 

770884 

29.6311 

772641 

29.6479 

774400 

29.6648 

















APPENDIX 


321 


Tablb 6.—Squares and Square Roots.— {Continued) 


776161 

777924 

779689 

781456 

783226 

784996 

786769 

788544 

790321 

792100 

793881 

795664 

797449 

799236 

801025 

802816 

804609 

806404 

808201 

810000 

811801 

813604 

816409 

817216 

819025 

820836 

822649 

824464 

826281 

828100 

829921 

831744 

833569 

835396 

837225 

839056 

840889 

842724 

844561 

846400 


29.6816 

29.6985 

29.7163 

29.7321 

29.7489 

29.7658 

29.7825 

29.7993 

29.8161 

29.8329 

29.8496 

29.8664 

29.8831 

29.8998 

29.9166 

29.9333 

29.9500 

29.9666 

29.9833 

30.0000 

30.0167 

30.0333 

30.0500 

30.0666 

30.0832 

30,0998 

30.1164 

30.1330 

30.1496 

30.1662 

30.1828 

30.1993 

30.2159 

30.2324 

36.2490 

30.2655 

30.2820 

30.2986 

30.3150 

30.3316 


Number 

Square 

Square root 

921 

848241 

30.3480 

922 

850084 

30.3645 

923 

851929 

30.3809 

924 

853776 

30.3974 

925 

855625 

30.4138 

926 

857476 

30.4302 

927 

859329 

30.4467 

928 

861184 

30.4631 

929 

863041 

30.4795 

930 

864900 

30.4959 

931 

866761 

30.5123 

932 

868624 

30.5287 

933 

870489 

30.5450 

934 

872356 

30.5614 

935 

874225 

30.5778 

936 

876096 

30.5941 

937 

877969 

30.6105 

938 

879844 

30.6268 

939 

881721 

30.6431 

940 

883600 

30.6594 

941 

885481 

30.6757 

942 

887364 

30.6920 

943 

889249 

30.7083 

944 

891136 

30.7246 

945 

893025 

30.7409 

946 

894916 

30.7571 

947 

896809 

30.7734 

948 

898704 

30.7896 

949 

900601 

30.8058 

950 

902500 

30.8221 

951 

904401 

30.8383 

952 

906304 

30.8545 

953 

908209 

30.8707 

954 

910116 

30.8869 

955 

912025 

30.9031 

956 

913936 

30.9192 

967 

915849 

30.9354 

958 

917764 

30.9516 

959 

919681 

30.9677 

960 

921600 

30.9839 












322 ELEMENTARY SOCIAL STATISTICS 


Tablb 6.—Squares and Square Roots.— (Continiied) 


Number 

Square 

Square root 

Number 

Square 

Square root 

961 

923521 

31.0000 

981 

962361 

31.3209 

962 

925444 

31.0161 

982 

964324 

31.3369 

963 

927369 

31.0322 

983 

966289 

31.3528 

964 

929296 

31.0483 

984 

968256 

31.3688 

965 

931225 

31.0644 

985 

970225 

31.3847 

966 

933156 

31.0806 

986 

972196 

31.4006 

967 

935089 

31.0966 

987 

974169 

31.4166 

968 

937024 

31.1127 

988 

976144 

31.4325 

969 

938961 

31.1288 

989 

978121 

31.4484 

970 

940900 

31.1448 

990 

980100 

31.4643 

971 

942841 

31.1609 

991 

982081 

31.4802 

972 

944784 

31.1769 

992 

984064 

31.4960 

973 

946729 

31.1929 

993 

986049 

31.5119 

974 

948676 

31.2090 

994 

988036 

31.5278 

976 

950625 

31.2250 

995 

990025 

31.5436 

976 

952676 

31.2410 

996 

992016 

31.5596 

977 

954529 

31.2570 

997 

994009 

31.5753 

978 

956484 

31.2730 

998 

996004 

31.5911 

979 

958441 

31.2890 

999 

998001 

31.6070 

980 

960400 

31.3050 

1000 

1000000 

31.6228 







APPENDIX 


323 


Foreword to Table 7.—Logarithms are the greatest labor-saving discovery 
ever made in the field of mathematics. With their aid, many calculations 
can be performed easily and quickly that would not be feasible at all without 
them. 

The common logarithm of a number is the power to which 10 must be 
raised to produce that number. For example, 10* = 100, so the logarithm 
of 100 is 2. Similarly, 90 = 10 ^••‘^*^ and the logarithm of 90 is 1.95424. 
In general, if F = 10*, then log Y — x. 

That part of the logarithm to the left of the decimal is called the charac¬ 
teristic, while that part to the right of the decimal is the mantissa. Thus, 
for log 10 = 1.95424, the characteristic is i, the mantissa is .95424. 

There are three fundamental principles that are constantly needed in 
working with logarithms: 

1. log {ah) =» log a + log 6, 

2. log (a/5) = log a - log 5, 

3. log (a*») = n log a. 

To find the mantissa of any number, we enter Table 7, find the first three 
digits of the number in the left-hand column, headed “No.,” and find the 
fourth digit in the top row, then read off the mantissa from the proper row 
and column. Thus, for the number 1,503, we find 150 in the first column 
and “3” in the fifth column of the table, and read off the mantissa .17696. 

The characteristic of a logarithm is discovered by placing the pencil point 
between the first two significant figures (the first figure that is not zero is the 
first significant figure) of the number, and moving it to the right or left so 
many places to the decimal point. If the pencil is moved to the right, the 
characteristic is positive; if to the left, it is negative. Thus, for the number 
1,503, the pencil is placed between 1 and 5, and moved three places to the 
right. The characteristic is therefore 3, and the complete logarithm is 
3.17696. 

If the number is 15,030, the mantissa is the same, but the characteristic 
is 4, so that the logarithm is 4.17696. 

In the case of the number 15,037, the exact mantissa cannot be read 
directly from Table 7, but an approximate mantissa can be obtained by 
taking the mantissa of the number 1,504, or, more accurately, by interpolat¬ 
ing. To interpolate, we subtract the mantissa of the number just smaller 
than the given number from the mantissa of the number just larger, then 
subtract from the given number the table number just smaller, place a 
decimal point before the first figure of this last difference, multiply the first 
difference by this value, and add the product to the mantissa of the table 
number just smaller than the given number. Thus, the mantissa of 15,030 
is .17696, the mantissa of 15,040 is .17725, and their difference is .00029; the 
difference between the given number and the table number just smaller is 
15,037 — 15,030 = 7, which becomes .7; the product of the first difference 
and this value is .00029 X .7 .000203; and this product added to the 

mantissa of 15,030 is .17696 + .000203 » .17716. The logarithm of 15,037 
is therefore 4.17716. 

Suppose that the number whose logarithm is required is 15.037. The 
mantissa is the same as that just found for the number 15,037, but the 



324 


ELEMENTARY SOCIAL STATISTICS 


characteristic changes from 4 to 1, so that the logarithm is now 1.17716. 
Similarly, the logarithm of 1.5037 is 0.17716. 

If we move the decimal one or more places to the left, giving, say, the 
number .15037, we do not change the mantissa, but we encounter a negative 
characteristic. For, if we put a pencil between 1 and 5, we must move it 
one place to the left to reach the decimal point. The characteristic is theq 
—1. To avoid these awkward negative characteristics, it is customary to 
write —1 in the form 9 . . . . — 10, —2 in the form 8 . . . . —10, etc. 
Hence, the logarithm of .15037 is written 9.17716 — 10. 

To obtain from Table 7 the number corresponding to a logarithm, we 
find in the table the mantissa of the logarithm, write down the number 
corresponding to it, and then point off this number in accordance with the 
characteristic of the logarithm. For example, given the logarithm 2.27921, 
we look in the table for the mantissa .27921, read off the corresponding 
number 1,902, and point off this number by placing our pencil point between 
the figures 1 and 9, then moving it two places to the right as indicated by the 
positive characteristic, 2, getting 190.2 as the result. If the logarithm is 
0.27921, the number is 1.902; if the logarithm is 8.27921 — 10, the number 
is 0.01902; and so on. 

Let us now find a geometric mean by the use of logarithms. By formula 
(14) of Chap. VII, 

G = (5 • 11 • 19)**. 

According to principle 3, above, 

log (7 * i log (5 • 11 • 19). 

And by principle 1, 

log (7 = J (log 5 + log 11 + log 19). 

Now, the numbers 5, 11, and 19 do not appear in Table 7, but the numbers 
5,000,1,100, and 1,900, which have the same mantissas, may be found there. 
The mantissa of 5,000 is .69897, and the characteristic of 5 is 0, so log 5 is 
0.69897. In the same way, we find log 11 = 1.04139, and log 19 = 1.27875. 
We therefore liave 

log (7 - J (0.69897 + 1.04139 + 1.27875) » J (3.01911) 

- 1.00637. 

Looking in the table for the mantissa .00637, the nearest we can find to it is 
the mantissa .00647, to which the corresponding number is 1,015. Pointing 
this off according to the characteristic, 1, we get 10.15 as the geometric 
mean. If greater accuracy is wanted, we may interpolate in the table. Our 
mantissa, .00637, falls between the two tabular mantissas .00604 and .00647. 
We therefore have .00637 - .00604 « .00033; .00647 - .00604 « .00043; 
and .00033/.00043 » .767. That is, our matissa indicates a number about 
I of the way between 10.14 and 10.15, or roughly 10.14 + .00767 »10.14767. 
11 *19-5X11X19. 



APPENDIX 


325 


Table 7.“—Five-place Common Logarithms or Numbers 

100-149 


No. 


■ 


1 


2 


8 


■ 

1 

■ 


6 


7 


8 


9 

100 

00 

000 

00 

043 

00 

087 

00 

130 

00 

173 

00 

217 

00 

260 

00 

303 

00 

346 

00 

389 

101 

00 

432 

00 

475 

00 

518 

00 

561 

00 

604 

00 

647 

00 

689 

00 

732 

00 

775 

00 

817 

102 

00 

860 

00 

903 

00 

945 

00 

988 

01 

030 

01 

072 

01 

115 

01 

157 

01 

199 

01 

?42 

103 

01 

284 

01 

826 

01 

868 

01 

410 

01 

452 

01 

494 

01 

536 

01 

578 

01 

620 

01 

662 

104 

01 

703 

01 

745 

01 

787 

01 

828 

01 

870 

01 

912 

01 

053 

01 

995 

02 

036 

02 

078 

106 

02 

119 

02 

160 

02 

202 

02 

243 

02 

284 

02 

825 

02 

866 

02 

407 

02 

449 

02 

490 

106 

02 

531 

02 

572 

02 

612 

02 

653 

02 

604 

02 

736 

02 

776 

02 

816 

02 

857 

02 

898 

107 

02 

938 

02 

979 

03 

019 

03 

060 

03 

100 

03 

141 

03 

181 

03 

222 

03 

262 

03 

302 

108 

03 

342 

03 

383 

03 

423 

03 

463 

03 

503 

03 

543 

03 

583 

03 

623 

03 

663 

03 

703 

109 

03 

743 

03 

782 

03 

822 

03 

862 

03 

902 

03 

941 

03 

981 

04 

021 

04 

060 

04 

100 

110 

04 

139 

04 

179 

04 

218 

04 

258 

04 

297 

04 

836 

04 

876 

04 

415 

04 

464 

04 

493 

111 

04 

532 

04 

571 

04 

610 

04 

660 

04 

689 

04 

727 

04 

766 

04 

805 

04 

844 

04 

883 

112 

04 

922 

04 

961 

04 

999 

05 

038 

05 

077 

05 

115 

05 

154 

05 

192 

06 

231 

05 

269 

118 

05 

308 

06 

846 

05 

386 

05 

423 

05 

461 

05 

500 

05 

538 

06 

576 

05 

614 

05 

652 

114 

05 

690 

05 

729 

05 

767 

05 

806 

05 

843 

05 

881 

05 

918 

06 

956 

06 

994 

06 

032 

115 

06 

070 

06 

108 

06 

145 

06 

183 

06 

221 

06 

258 

06 

296 

06 

333 

06 

371 

06 

408 

116 

06 

446 

06 

483 

06 

521 

06 

558 

06 

505 

06 

633 

06 

670 

06 

707 

06 

744 

06 

781 

117 

06 

819 

06 

866 

06 

803 

06 

030 

06 

967 

07 

004 

07 

041 

07 

078 

07 

115 

07 

151 

118 

07 

188 

07 

226 

07 

262 

07 

208 

07 

335 

07 

372 

07 

408 

07 

445 

07 

482 

07 

518 

119 

07 

566 

07 

591 

07 

628 

07 

664 

07 

700 

07 

737 

07 

773 

07 

809 

07 

846 

07 

882 

100 

07 

918 

07 

964 

07 

990 

08 

027 

08 

063 

08 

099 

08 

135 

08 

171 

08 

207 

08 

243 

121 

08 

279 

08 

314 

08 

350 

08 

386 

08 

422 

08 

458 

08 

493 

08 

529 

08 

565 

08 

600 

122 

08 

636 

08 

672 

08 

707 

08 

743 

08 

778 

08 

814 

08 

849 

08 

884 

08 

920 

08 

955 

123 

08 

991 

09 

026 

09 

061 

09 

096 

09 

132 

09 

167 

00 

202 

09 

237 

09 

272 

09 

307 

124 

09 

842 

09 

377 

09 

412 

09 

447 

09 

482 

09 

517 

09 

552 

09 

587 

09 

621 

09 

656 

125 

09 

691 

09 

726 

09 

760 

09 

795 

09 

830 

09 

864 

09 

899 

09 

934 

09 

968 

10 

003 

126 

10 

037 

10 

072 

10 

106 

10 

140 

10 

175 

10 

209 

10 

243 

10 

278 

10 

312 

10 

346 

127 

10 

380 

10 

416 

10 

449 

10 

483 

10 

517 

10 

551 

10 

586 

10 

619 

10 

653 

10 

687 

128 

10 

721 

10 

756 

10 

789 

10 

823 

10 

857 

10 

890 

10 

924 

10 

958 

10 

992 

11 

025 

129 

11 

059 

11 

093 

11 

126 

11 

160 

11 

193 

11 

227 

11 

261 

11 

294 

11 

327 

11 

361 

180 

11 

394 

11 

428 

11 

461 

11 

404 

11 

528 

11 

561 

11 

594 

11 

628 

11 

661 

11 

694 

131 

11 

727 

11 

760 

11 

793 

11 

826 

11 

800 

11 

893 

11 

926 

11 

959 

11 

992 

12 

024 

182 

12 

067 

12 

090 

12 

123 

12 

156 

12 

180 

12 

222 

12 

254 

12 

287 

12 

320 

12 

352 

133 

12 

885 

12 

418 

12 

450 

12 

483 

12 

516 

12 

548 

12 

581 

12 

613 

12 

646 

12 

678 

184 

12 

710 

12 

743 

12 

775 

12 

808 

12 

840 

12 

372 

12 

905 

12 

937 

12 

969 

13 

001 

135 

13 

033 

18 

066 

13 

008 

13 

130 

13 

162 

18 

194 

18 

226 

13 

258 

13 

290 

13 

322 

136 

18 

854 

13 

886 

13 

418 

13 

460 

18 

481 

13 

513 

13 

546 

13 

577 

13 

609 

13 

640 

137 

13 

672 

13 

704 

13 

735 

13 

767 

13 

799 

13 

830 

13 

862 

13 

893 

13 

925 

13 

956 

138 

13 

988 

14 

019 

14 

061 

14 

082 

14 

114 

14 

145 

14 

176 

14 

208 

14 

239 

14 

270 

139 

14 

801 

14 

833 

14 

864 

14 

895 

14 

426 

14 

457 

14 

489 

14 

520 

14 

551 

14 

582 

140 

14 

618 

14 

644 

14 

676 

14 

706 

14 

737 

14 

768 

14 

799 

14 

829 

14 

860 

14 

891 

141 

14 

922 

14 

958 

14 

983 

15 

014 

15 

045 

15 

076 

16 

106 

16 

137 

16 

168 

15 

198 

142 

15 

229 

15 

269 

15 

290 

15 

320 

15 

351 

15 

381 

16 

412 

16 

442 

16 

473 

15 

503 

143 

15 

534 

15 

564 

15 

594 

15 

626 

15 

655 

15 

685 

16 

715 

16 

746 

15 

776 

15 

806 

144 

15 

836 

15 

866 

15 

897 

15 

927 

15 

957 

15 

087 

16 

017 

16 

047 

16 

077 

16 

107 

145 

16 

187, 

,16 

167 

16 

197 

16 

227 

16 

256 

16 

286 

16 

316 

16 

346 

16 

376 

16 

406 

146 

16 

435 

16 

465 

16 

406 

16 

524 

16 

554 

16 

584 

16 

613 

16 

643 

16 

673 

16 

702 

147 

16 

732 

16 

761 

16 

791 

16 

820 

16 

850 

16 

879 

16 

909 

16 

938 

16 

967 

16 

997 

148 

17 

026 

17 

056 

17 

086 

17 

114 

17 

143 

17 

173 

17 

202 

17 

231 

17 

260 

17 

289 

149 

17 

819 

17 

848 

17 

877 

17 

406 

17 

435 

17 

464 

17 

403 

17 

522 

17 

551 

17 

580 

No. 


0 


1 


2 


8 


4 


5 


6 


B 

■ 

B 


9 


100-149 













326 


ELEMENTARY SOCIAL STATISTICS 


Table 7.—Five-place Common Logarithms op Numbers.— (Continued) 

160-199 


No. 


■ 


1 . 


2 


3 


4 

5 


6 


7 


8 


9 

150 

17 

609 

17 

638 

17 

667 

17 

696 

17 

728 

17 754 

17 

782 

17 

811 

17 

840 

17 

869 

151 

17 

898 

17 

926 

17 

955 

17 

984 

18 

013 

18 041 

18 

070 

18 

099 

18 

127 

18 

156 

mm 

18 

184 

18 

213 

18 

241 

18 

270 

18 

298 

18 327 

18 

355 

18 

384 

18 

412 

18 

441 


18 

469 

18 

498 

18 

526 

18 

554 

18 

583 

18 611 

18 

639 

18 

667 

18 

696 

18 

724 

li 

18 

752 

18 

780 

18 

808 

18 

837 

18 

865 

18 893 

18 

921 

18 

949 

18 

977 

19 

005 

■1 

19 

033 

19 

061 

19 

089 

19 

117 

19 

145 

19 173 

19 

201 

19 

229 

19 

257 

19 

285 

156 

19 

312 

19 

340 

19 

368 

19 

396 

19 

424 

19 451 

19 

479 

19 

507 

19 

535 

19 

562 

157 

19 

590 

19 

618 

19 

645 

19 

673 

19 

700 

19 728 

19 

756 

19 

783 

19 

811 

19 

838 

158 

19 

866 

19 

893 

19 

921 

19 

948 

19 

976 

20 003 

20 

030 

20 

058 

20 

085 

20 

112 

159 

20 

140 

20 

167 

20 

194 

20 

222 

20 

249 

20 276 

20 

303 

20 

330 

20 

358 

20 

385 

160 

20 

412 

20 

439 

20 

466 

20 

493 

20 

520 

20 548 

20 

575 

20 

602 

20 

629 

20 

656 

161 

20 

683 

20 

710 

20 

737 

20 

763 

20 

790 

20 817 

20 

844 

2d 

871 

20 

898 

20 

925 

162 

20 

952 

20 

978 

21 

005 

21 

032 

21 

059 

21 085 

21 

112 

21 

139 

21 

165 

21 

192 

163 

21 

219 

21 

245 

21 

272 

21 

299 

21 

325 

21 352 

21 

378 

21 

405 

21 

431 

21 

458 

164 

21 

484 

21 

511 

21 

537 

21 

564 

21 

590 

21 617 

21 

643 

21 

669 

21 

696 

21 

722 

165 

21 

748 

21 

778 

21 

801 

21 

827 

21 

854 

21 880 

21 

906 

21 

932 

21 

958 

21 

985 

166 

22 

on 

22 

037 

22 

063 

22 

089 

22 

115 

22 141 

22 

167 

22 

194 

22 

220 

22 

246 

167 

22 

272 

22 

298 

22 

324 

22 

380 

22 

376 

22 401 

22 

427 

22 

453 

22 

479 

22 

505 

168 

22 

531 

22 

557 

22 

583 

22 

608 

22 

634 

22 660 

22 

686 

22 

712 

22 

737 

22 

763 

169 

22 

789 

22 

814 

22 

840 

22 

866 

22 

891 

22 917 

22 

943 

22 

968 

22 

994 

23 

019 

170 

23 

048 

23 

070 

23 

096 

23 

121 

23 

147 

23 172 

23 

198 

23 

223 

23 

249 

23 

274 

171 

23 

300 

23 

325 

23 

350 

23 

376 

23 

401 

23 426 

23 

452 

23 

477 

23 

502 

23 

528 

172 

23 

553 

23 

578 

23 

603 

23 

629 

23 

654 

23 679 

23 

704 

23 

729 

23 

754 

23 

779 

173 

23 

808 

23 

830 

23 

858 

23 

880 

23 

905 

23 930 

23 

956 

23 

980 

24 

005 

24 

030 

174 

24 

058 

24 

080 

24 

108 

24 

130 

24 

158 

24 180 

24 

204 

24 

229 

24 

254 

24 

279 

175 

24 

304 

24 

329 

24 

353 

24 

378 

24 

403 

24 428 

24 

452 

24 

477 

24 

502 

24 

527 

176 

24 

551 

24 

576 

24 

601 

24 

625 

24 

650 

24 674 

24 

699 

24 

724 

24 

748 

24 

773 

177 

24 

797 

24 

822 

24 

846 

24 

871 

24 

895 

24 920 

24 

944 

24 

969 

24 

993 

25 

018 

178 

25 

042 

25 

066 

25 

091 

25 

115 

25 

139 

25 164 

25 

188 

25 

212 

25 

237 

25 

261 

179 

25 

285 

25 

310 

25 

334 

25 

358 

25 

382 

25 406 

25 

431 

25 

455 

25 

479 

25 

503 

180 

25 

527 

25 

551 

25 

575 

25 

600 

25 

624 

25 648 

25 

672 

25 

696 

25 

720 

25 

744 

181 

25 

768 

25 

792 

25 

816 

25 

840 

25 

864 

25 888 

25 

912 

25 

935 

25 

959 

25 

983 

182 

26 

007 

26 

031 

26 

058 

26 

079 

26 

102 

26 126 

26 

150 

26 

174 

26 

198 

26 

221 

183 

26 

245 

26 

269 

26 

293 

26 

316 

26 

340 

26 364 

26 

387 

26 

411 

26 

435 

26 

458 

184 

26 

482 

26 

505 

26 

529 

26 

553 

26 

576 

26 600 

26 

623 

26 

647 

26 

670 

26 

694 

185 

26 

717 

26 

741 

26 

764 

26 

788 

26 

811 

26 834 

26 

858 

26 

881 

26 

905 

26 

928 

186 

26 

951 

26 

978 

26 

998 

27 

021 

27 

046 

27 068 

27 

091 

27 

114 

27 

138 

27 

161 

187 

27 

184 

27 

207 

27 

231 

27 

254 

27 

277 

27 300 

27 

323 

27 

346 

27 

370 

27 

393 

188 

27 

416 

27 

439 

27 

462 

27 

485 

27 

508 

27 531 

27 

554 

27 

677 

27 

600 

27 

623 

189 

27 

646 

27 

669 

27 

692 

27 

715 

27 

738 

27 761 

27 

784 

27 

807 

27 

830 

27 

852 

190 

27 

875 

27 

898 

27 

921 

27 

944 

27 

967 

27 989 

28 

012 

28 

035 

28 

058 

28 

081 

191 

28 

103 

28 

126 

28 

149 

28 

171 

28 

194 

28 217 

28 

240 

28 

262 

2t 

385 

28 

307 

192 

28 

330 

28 

353 

28 

375 

28 

398 

28 

421 

28 443 

28 

466 

28 

488 

28 

511 

28 

533 

193 

28 

556 

28 

578 

28 

601 

28 

623 

28 

646 

28 668 

28 

691 

28 

713 

28 

735 

28 

758 

194 

28 

780 

28 

803 

28 

828 

28 

847 

28 

870 

28 892 

28 

914 

28 

937 

28 

950 

28 

981 

195 

29 

003 

29 

026 

29 

048 

29 

070 

29 

092 

29 116 

29 

137 

29 

159 

29 

181 

29 

203 

196 

29 

226 

29 

248 

29 

270 

29 

292 

29 

314 

29 336 

29 

358 

29 

380 

29 

403 

29 

425 

197 

1 29 

447 

29 

469 

29 

491 

29 

513 

29 

536 

29 557 

29 

679 

29 

601 

29 

623 

29 

645 

198 

29 

667 

29 

688 

29 

710 

29 

732 

29 

754 

29 776 

29 

798 

29 

820 

29 

842 

29 

863 

190 

29 

885 

29 

907 

29 

929 

29 

951 

29 

973 

29 994 

30 

016 

30 

038 

30 

060 

30 

081 

No. 

( 

0 


1 


2 


8 



m 


6 

7 


8 


0 


160-199 















APPENDIX 


327 


Table 7.—Five-place Common Logarithms op Numbers.— {Continued) 

200-249 


No. 


0 


1 

2 


3 


4 


5 


6 


7 


8 


9 

soo 

30 

103 

30 

125 

30 146 

30 

168 

30 

190 

30 

211 

30 

233 

30 

255 

30 

276 

30 

208 

201 

30 

320 

30 

341 

30 863 

30 

384 

30 

406 

30 

428 

30 

449 

30 

471 

30 

492 

30 

514 

202 

30 

535 

30 

557 

30 578 

30 

600 

30 

621 

30 

643 

30 

664 

30 

685 

30 

707 

30 

728 

203 

30 

7fi0 

30 

771 

30 702 

30 

814 

30 

835 

30 

856 

30 

878 

30 

899 

30 

920 

80 

942 

204 

30 

963 

30 

984 

31 006 

31 

027 

31 

048 

31 

069 

81 

091 

31 

112 

31 

133 

31 

154 

205 

31 

175 

81 

197 

81 218 

31 

239 

31 

260 

31 

281 

31 

302 

31 

323 

31 

345 

31 

366 

206 

31 

387 

31 

408 

31 429 

31 

450 

31 

471 

31 

492 

31 

513 

31 

534 

31 

555 

31 

576 

207 

31 

697 

31 

618 

31 639 

31 

660 

31 

681 

31 

702 

31 

723 

31 

744 

31 

765 

31 

785 

208 

31 

806 

31 

827 

31 848 

31 

869 

31 

890 

31 

911 

31 

931 

31 

052 

31 

073 

31 

904 

20Q 

32 

ois 

32 

035 

32 056 

32 

077 

82 

098 

32 

118 

32 

139 

32 

160 

32 

181 

32 

201 

210 

32 

222 

32 

243 

32 263 

32 

284 

32 

305 

32 

325 

32 

346 

32 

366 

32 

387 

32 

408 

211 

32 

428 

32 

449 

32 469 

32 

490 

32 

510 

32 

531 

32 

552 

32 

572 

32 

593 

32 

613 

212 

32 

634 

32 

654 

32 675 

32 

695 

32 

715 

32 

736 

32 

756 

32 

777 

32 

797 

32 

818 

213 

32 

838 

32 

858 

32 879 

32 

899 

32 

919 

32 

940 

82 

960 

32 

980 

33 

001 

33 

021 

214 

33 

041 

33 

062 

33 082 

33 

102 

33 

122 

33 

143 

33 

163 

33 

183 

33 

203 

33 

224 

215 

33 

244 

33 

264 

33 284 

33 

304 

33 

325 

33 

345 

33 

365 

33 

385 

33 

405 

33 

425 

216 

33 

445 

33 

465 

33 486 

33 

506 

33 

526 

33 

546 

33 

566 

33 

586 

33 

606 

33 

626 

217 

33 

646 

33 

666 

33 686 

33 

706 

33 

728 

33 

746 

33 

766 

33 

786 

33 

806 

33 

826 

218 

33 

846 

33 

866 

33 885 

33 

905 

33 

925 

33 

945 

33 

965 

33 

985 

34 

005 

34 

025 

210 

34 

044 

34 

064 

34 084 

34 

104 

34 

124 

34 

143 

34 

163 

34 

183 

34 

203 

34 

223 

220 

34 

242 

34 

262 

34 282 

34 

301 

34 

321 

34 

341 

34 

361 

34 

380 

34 

400 

34 

420 

221 

34 

439 

34 

459 

34 479 

34 

498 

34 

518 

34 

537 

34 

557 

34 

577 

34 

596 

34 

616 

222 

34 

635 

34 

655 

34 674 

34 

694 

34 

713 

34 

733 

34 

753 

34 

772 

34 

792 

34 

811 

223 

34 

830 

34 

850 

34 869 

34 

889 

84 

908 

34 

928 

34 

947 

34 

967 

34 

986 

35 

005 

224 

35 

025 

35 

044 

35 064 

35 

083 

35 

102 

35 

122 

85 

141 

35 

160 

85 

180 

35 

199 

226 

35 

218 

35 

238 

35 257 

35 

276 

35 

295 

35 

315 

35 

334 

35 

353 

35 

372 

35 

392 

226 

35 

411 

35 

430 

35 449 

35 

468 

35 

488 

35 

507 

35 

526 

35 

543 

35 

564 

35 

583 

227 

35 

603 

35 

622 

35 641 

35 

660 

35 

679 

35 

608 

35 

717 

35 

736 

35 

755 

35 

774 

228 

35 

793 

35 

813 

35 832 

35 

851 

35 

870 

35 

889 

35 

908 

35 

927 

35 

946 

35 

965 

229 

35 

984 

36 

003 

36 021 

36 

040 

36 

059 

36 

078 

36 

097 

36 

116 

36 

135 

36 

154 

280 

36 

173 

36 

102 

36 211 

36 

229 

36 

248 

36 

267 

36 

286 

36 

305 

36 

324 

36 

342 

231 

36 

361 

36 

380 

36 399 

36 

418 

86 

436 

36 

455 

36 

474 

36 

493 

36 

511 

36 

530 

232 

36 

549 

36 

568 

36 586 

36 

6O5 

36 

624 

36 

642 

36 

661 

36 

680 

36 

698 

36 

717 

233 

36 

736 

36 

754 

36 773 

36 

791 

36 

810 

36 

829 

36 

847 

36 

866 

36 

884 

36 

903 

234 

36 

922 

86 

940 

36 959 

36 

977 

36 

906 

37 

014 

37 

033 

37 

051 

37 

070 

37 

088 1 

235 

37 

107 

87 

125 

87 144 

37 

162 

37 

181 

37 

199 

37 

218 

37 

236 

37 

254 

37 

273 

236 

87 

291 

37 

310 

37 328 

37 

346 

37 

365 

37 

383 

37 

401 

37 

420 

37 

438 

37 

457 

237 

87 

475 

87 

493 

37 511 

37 

530 

37 

548 

37 

566 

37 

585 

37 

603 

37 

621 

87 

639 

238 

37 

658 

87 

676 

87 694 

37 

712 

37 

731 

37 

749 

37 

767 

37 

785 

37 

803 

37 

822 

239 

37 

840 

37 

858 

37 876 

37 

894 

37 

912 

37 

931 

37 

049 

37 

967 

37 

985 

38 

003 

240 

38 

021 

38 

039 

38 057 

38 

075 

38 

093 

88 

112 

38 

130 

38 

148 

38 

166 

38 

184 

241 

38 

202 

38 

220 

88 238 

38 

256 

38 

274 

38 

292 

38 

310 

38 

328 

38 

346 

38 

364 

242 

38 

382 

38 

399 

88 417 

88 

435 

38 

453 

38 

471 

38 

489 

38 

507 

38 

525 

38 

543 

243 

38 

561 

38 

578 

38 596 

88 

614 

88 

632 

38 

650 

38 

668 

88 

686 

38 

703 

38 

721 

244 

38 

739 

88 

757 

38 775 

88 

792 

88 

810 

38 

828 

38 

846 

38 

863 

88 

881 

38 

899 

245 

38 

917 

88 

034 

38 952 

38 

970 

38 

987 

39 

005 

39 

023 

39 

041 

39 

058 

39 

076 

246 

30 

094 

89 

111 

39 129 

39 

146 

39 

164 

39 

182 

39 

199 

39 

217 

39 

235 

39 

252 

247 

30 

270 

30 

287 

39 305 

30 

322 

89 

840 

39 

358 

39 

375 

39 

393 

39 

410 

80 

428 

248 

39 

445 

39 

463 

39 480 

39 

408 

89 

515 

39 

533 

39 

550 

39 

568 

39 

585 

39 

602 

249 

89 

620 

30 

637 

39 655 

39 

672 

39 

690 

39 

707 

39 

724 

89 

742 

30 

759 

89 

777 

No. 

! 

0 


1 

2 


3 



■ 

■ 


6 


■ 

1 

■ 


9 


200-248 













328 


ELEMENTARY SOCIAL STATISTICS 


Table 7.—Five-place Common Logabithms op Numbers.— (Continued) 

260-299 


No. 


0 


■ 

1 

■ 


8 


4 


5 


6 


■ 

1 

■ 


0 

SM 

39 

794 

30 

811 

30 

829 

39 

840 

80 

863 

30 

881 

39 

808 

30 

015 

39 

033 

89 

950 

251 

39 

067 

SO 

986 

40 

002 

40 

019 

40 

037 

40 

054 

40 

071 

40 

088 

40 

106 

40 

123 

252 

40 

140 

40 

157 

40 

175 

40 

102 

40 

200 

40 

226 

40 

243 

40 

261 

40 

278 

40 

206 

253 

40 

312 

40 

829 

40 

846 

40 

864 

40 

381 

40 

308 

40 

415 

40 

432 

40 

449 

40 

466 

254 

40 

483 

40 

600 

40 

618 

40 

635 

40 

652 

40 

669 

40 

686 

40 

603 

40 

620 

40 

637 

265 

40 

654 

40 

671 

40 

688 

40 

706 

40 

722 

40 

739 

40 

766 

40 

773 

40 

790 

40 

807 

256 

40 

824 

40 

841 

40 

868 

40 

875 

40 

802 

40 

009 

40 

026 

40 

943 

40 

060 

40 

076 

267 

40 

003 

41 

010 

41 

027 

41 

044 

41 

061 

41 

078 

41 

005 

41 

111 

41 

128 

41 

145 

258 

41 

162 

41 

179 

41 

106 

41 

212 

41 

220 

41 

246 

41 

263 

41 

280 

41 

206 

41 

313 

250 

41 

330 

41 

847 

41 

863 

41 

880 

41 

397 

41 

414 

41 

430 

41 

447 

41 

464 

41 

481 

260 

41 

497 

41 

514 

41 

631 

41 

547 

41 

564 

41 

581 

41 

607 

41 

614 

41 

631 

41 

647 

261 

41 

664 

41 

681 

41 

607 

41 

714 

41 

731 

41 

747 

41 

764 

41 

780 

41 

707 

41 

814 

262 

41 

830 

41 

847 

41 

863 

41 

880 

41 

806 

41 

913 

41 

029 

41 

046 

41 

063 

41 

079 

263 

41 

006 

42 

012 

42 

029 

42 

045 

42 

062 

42 

078 

42 

005 

42 

111 

42 

127 

42 

144 

264 

42 

160 

42 

177 

42 

103 

42 

210 

11 

42 

226 

42 

243 

42 

250 

42 

275 

42 

202 

42 

308 

265' 

42 

326 

42 

341 

42 

857 

42 

874 

42 

300 

42 

406 

42 

423 

42 

439 

42 

456 

42 

472 

266 

42 

488 

42 

504 

42 

521 

42 

537 

42 

553 

42 

570 

42 

586 

42 

602 

42 

610 

42 

636 

267 

42 

651 

42 

667 

42 

684 

42 

700 

42 

716 

42 

732 

42 

749 

42 

765 

42 

781 

42 

797 

268 

42 

813 

42 

830 

42 

846 

42 

862 

42 

878 

42 

894 

42 

Oil 

42 

927 

42 

043 

42 

050 

260 

42 

076 

42 

001 

43 

008 

43 

024 

43 

040 

43 

056 

43 

072 

43 

088 

43 

104 

43 

120 

870 

43 

130 

43 

152 

43 

169 

43 

185 

43 

201 

43 

217 

43 

233 

43 

240 

43 

265 

43 

281 

271 

43 

207 

43 

313 

43 

329 

43 

345 

43 

361 

43 

377 

43 

393 

43 

409 

43 

425 

43 

441 

272 

43 

457 

43 

473 

43 

489 

43 

505 

43 

521 

43 

537 

43 

553 

43 

569 

43 

584 

43 

600 

273 

43 

616 

43 

632 

43 

648 

43 

664 

43 

680 

43 

606 

43 

712 

43 

727 

43 

743 

43 

769 

274 

43 

776 

43 

701 

43 

807 

43 

823 

43 

838 

43 

854 

43 

870 

43 

886 

43 

902 

43 

017 

276 

43 

933 

43 

949 

43 

965 

43 

081 

43 

090 

44 

012 

44 

028 

44 

044 

44 

060 

44 

076 

276 

44 

001 

44 

107 

44 

122 

44 

138 

44 

154 

44 

170 

44 

185 

44 

201 

44 

217 

44 

232 

277 

44 

248 

44 

264 

44 

279 

44 

295 

44 

311 

44 

326 

44 

342 

44 

358 

44 

878 

44 

389 

278 

44 

404 

44 

420 

44 

436 

44 

451 

44 

467 

44 

483 

44 

498 

44 

514 

44 

529 

44 

546 

279 

44 

560 

44 

676 

44 

602 

44 

607 

44 

623 

44 

638 

44 

654 

44 

669 

44 

685 

44 

700 

280 

44 

710 

44 

731 

44 

747 

44 

762 

44 

778 

44 

793 

44 

809 

44 

824 

44 

840 

44 

856 

281 

44 

871 

44 

886 

44 

002 

44 

917 

44 

032 

44 

048 

44 

063 

44 

979 

44 

004 

46 

010 

282 

45 

025 

45 

040 

45 

056 

46 

071 

45 

086 

45 

102 

45 

117 

45 

133 

45 

148 

46 

163 

283 

45 

179 

46 

104 

45 

209 

46 

225 

45 

240 

45 

255 

45 

271 

45 

286 

45 

801 

46 

317 

284 

45 

332 

45 

847 

45 

362 

46 

878 

45 

893 

45 

408 

45 

423 

45 

439 

45 

454 

46 

469 

285 

46 

484 

46 

600 

46 

615 

46 

530 

46 

646 

46 

661 

45 

676 

45 

601 

46 

600 

46 

621 

286 

46 

637 

46 

662 

46 

667 

46 

682 

46 

697 

45 

712 

45 

728 

46 

743 

46 

758 

45 

773 

287 

46 

788 

45 

803 

45 

818 

45 

834 

46 

849 

45 

864 

45 

879 

46 

804 

45 

009 

46 

024 

288 

46 

039 

46 

054 

46 

069 

46 

984 

46 

000 

40 

015 

46 

030 

46 

045 

46 

060 

46 

075 

289 

46 

090 

46 

105 

46 

120 

46 

135 

46 

150 

46 

165 

46 

180 

46 

105 

46 

210 

46 

225 

290 

46 

240 

46 

255 

40 

270 

40 

285 

40 

800 

46 

815 

46 

330 

46 

845 

40 

859 

46 

374 

201 

46 

889 

46 

404 

46 

419 

46 

434 

46 

449 

46 

464 

46 

479 

46 

404 

46 

509 

46 

523 

202 

46 

538 

46 

653 

46 

668 

46 

583 

46 

598 

46 

613 

46 

627 

46 

642 

46 

657 

46 

672 

203 

46 

687 

46 

702 

46 

716 

46 

731 

40 

746 

46 

761 

46 

776 

46 

700 

46 

805 

46 

820 

204 

46 

835 

46 

850 

46 

864 

46 

870 

46 

804 

46 

009 

46 

923 

46 

938 

46 

063 

46 

067 

205 

46 

082 

46 

007 

47 

012 

47 

020 

47 

041 

47 

056 

47 

070 

47 

086 

47 

100 

47 

114 

296 

47 

129 

47 

144 

47 

169 

47 

173 

47 

188 

47 

202 

47 

217 

47 

232 

47 

240 

47 

261 

297 

47 

276 

47 

200 

47 

805 

47 

810 

47 

334 

47 

349 

47 

863 

47 

878 

47 

802 

47 

407 

208 

47 

422 

47 

436 

47 

461 

47 

466 

47 

480 

47 

494 

47 

509 

47 

624 

47 

638 

47 

653 

290 

47 

667 

47 

682 

47 

606 

47 

611 

47 

625 

47 

640 

47 

654 

47 

669 

47 

683 

47 

608 

No; 


0 


1 


2 


8 


4 

E 

6 


6 


7 


8 




260-299 














APPENDIX 


329 


Table 7.—Five-placb Common Logarithms of Numbers.— (Continued) 

800-349 


No. 


0 


1 


2 


3 


■ 

■ 


6 


7 


8 


0 

soo 

47 

712 

47 

727 

47 

741 

47 

766 

47 

770 

47 

784 

47 

790 

47 

813 

47 

828 

47 

842 

801 

47 

867 

47 

871 

47 

886 

47 

900 

47 

914 

47 

920 

47 

043 

47 

058 

47 

072 

47 

086 

802 

48 

001 

48 

016 

48 

020 

48 

044 

48 

058 

48 

073 

48 

087 

48 

101 

48 

116 

48 

130 

803 

48 

144 

48 

150 

48 

173 

48 

187 

48 

202 

48 

216 

48 

230 

48 

244 

48 

250 

48 

273 

804 

48 

287 

48 

802 

48 

316 

48 

330 

48 

344 

48 

350 

48 

373 

48 

387 

48 

401 

48 

416 

806 

48 

430 

48 

444 

48 

468 

48 

473 

48 

487 

48 

501 

48 

515 

48 

530 

48 

544 

48 

558 

806 

48 

672 

48 

586 

48 

601 

48 

616 

48 

620 

48 

643 

48 

667 

48 

671 

48 

686 

48 

700 

307 

48 

714 

48 

728 

48 

742 

48 

766 

48 

770 

48 

785 

48 

799 

48 

813 

48 

827 

48 

841 

808 

48 

855 

48 

860 

48 

883 

48 

897 

48 

911 

48 

926 

48 

940 

48 

054 

48 

068 

48 

982 

809 

48 

996 

40 

010 

40 

024 

40 

038 

40 

052 

40 

066 

40 

080 

40 

004 

49 

108 

40 

122 

810 

49 

136 

49 

150 

40 

164 

49 

178 

40 

192 

40 

206 

40 

220 

40 

234 

40 

248 

40 

262 

811 

49 

276 

40 

290 

40 

304 

40 

318 

40 

332 

40 

346 

40 

360 

40 

374 

49 

388 

40 

402 

812 

40 

418 

49 

420 

40 

443 

40 

467 

40 

471 

49 

485 

40 

499 

40 

513 

40 

527 

49 

541 

813 

40 

554 

49 

568 

40 

582 

40 

696 

40 

610 

49 

624 

49 

638 

49 

651 

40 

663 

49 

670 

814 

40 

693 

40 

707 

49 

721 

40 

734 

40 

748 

40 

762 

40 

776 

40 

700 

40 

803 

49 

817 

815 

49 

831 

40 

845 

40 

850 

40 

872 

40 

886 

40 

900 

40 

014 

40 

027 

40 

041 

40 

956 

816 

40 

060 

40 

982 

40 

996 

50 

010 

50 

024 

60 

037 

50 

051 

50 

066 

60 

070 

50 

092 

317 

50 

106 

50 

120 

50 

133 

50 

147 

50 

161 

60 

174 

50 

188 

50 

202 

50 

213 

50 

229 

818 

50 

243 

60 

256 

60 

270 

50 

284 

50 

297 

60 

311 

50 

326 

50 

338 

50 

352 

50 

365 

810 

50 

379 

50 

393 

50 

406 

50 

420 

50 

433 

50 

447 

50 

461 

50 

474 

50 

488 

50 

601 

820 

50 

515 

50 

529 

50 

542 

50 

666 

50 

660 

50 

583 

50 

506 

60 

610 

50 

623 

60 

637 

821 

50 

651 

50 

664 

50 

678 

50 

691 

60 

706 

50 

718 

60 

732 

50 

745 

50 

750 

50 

772 

822 

50 

786 

50 

799 

50 

813 

60 

826 

60 

840 

60 

863 

50 

866 

50 

880 

50 

893 

60 

007 

323 

50 

920 

50 

934 

50 

947 

60 

061 

50 

074 

60 

987 

61 

001 

51 

014 

61 

028 

51 

041 

824 

51 

055 

51 

068 

51 

081 

51 

096 

51 

108 

61 

121 

51 

136 

51 

148 

51 

162 

51 

176 

826 

51 

188 

61 

202 

51 

216 

61 

228 

51 

242 

51 

255 

51 

268 

51 

282 

31 

205 

51 

308 

826 

51 

822 

51 

335 

61 

348 

51 

362 

51 

375 

61 

388 

61 

402 

51 

416 

51 

428 

51 

441 

827 

51 

455 

51 

468 

51 

481 

51 

496 

61 

608 

61 

521 

51 

534 

51 

548 

51 

561 

51 

374 

828 

51 

687 

51 

601 

51 

614 

51 

627 

51 

640 

61 

664 

51 

667 

61 

680 

51 

603 

31 

706 

820 

51 

720 

51 

733 

51 

746 

51 

759 

51 

772 

51 

786 

51 

790 

51 

812 

51 

823 

61 

838 

880 

51 

851 

51 

865 

61 

878 

51 

801 

51 

904 

61 

017 

51 

030 

51 

043 

51 

967 

51 

970 

831 

51 

083 

51 

006 

52 

000 

62 

022 

52 

035 

52 

048 

52 

061 

52 

076 

52 

088 

52 

101 

832 

52 

114 

52 

127 

52 

140 

52 

163 

52 

166 

52 

170 

52 

192 

52 

203 

52 

218 

62 

231 

833 

52 

244 

52 

257 

52 

270 

52 

284 

52 

297 

62 

810 

52 

323 

52 

336 

52 

340 

52 

362 

834 

52 

375 

52 

388 

52 

401 

52 

414 

52 

427 

52 

440 

52 

453 

52 

466 

52 

470 

52 

402 

835 

52 

504 

52 

517 

52 

530 

52 

543 

52 

566 

52 

569 

52 

582 

52 

593 

52 

608 

52 

621 

836 

52 

634 

52 

647 

52 

660 

52 

673 

62 

686 

52 

699 

52 

711 

52 

724 

52 

737 

52 

760 

837 

52 

763 

52 

776 

52 

789 

52 

802 

52 

816 

62 

827 

52 

840 

52 

853 

52 

866 

52 

870 

838 

52 

892 

52 

005 

52 

017 

52 

930 

52 

043 

52 

056 

52 

960 

52 

982 

52 

904 

53 

007 

830 

58 

020 

53 

033 

53 

046 

53 

058 

53 

071 

53 

084 

53 

097 

53 

110 

53 

122 

53 

133 

840 

53 

148 

53 

161 

53 

178 

53 

186 

53 

109 

53 

212 

53 

224 

53 

237 

53 

260 

63 

268 

841 

53 

275 

53 

288 

53 

301 

53 

314 

53 

326 

53 

330 

53 

352 

53 

364 

53 

377 

53 

390 

842 

53 

403 

53 

416 

63 

428 

53 

441 

53 

453 

53 

466 

53 

470 

53 

491 

53 

604 

53 

317 

843 

63 

529 

58 

542 

58 

555 

53 

667 

53 

680 

53 

503 

53 

605 

53 

618 

53 

631 

33 

643 

844 

53 

656 

53 

668 

53 

681 

53 

604 

63 

706 

53 

710 

53 

732 

53 

744 

53 

767 

63 

769 

845 

58 

782 

53 

704 

53 

807 

53 

820 

53 

832 

53 

846 

53 

857 

53 

870 

53 

882 

63 

805 

846 

53 

908 

56 

020 

53 

033 

53 

045 

53 

958 

53 

070 

53 

983 

53 

093 

54 

008 

54 

020 

847 

54 

033 

54 

046 

54 

058 

54 

070 

54 

083 

54 

095 

54 

108 

54 

120 

54 

133 

34 

145 

848 

54 

158 

54 

170 

54 

183 

54 

105 

54 

208 

54 

220 

54 

233 

54 

243 

64 

258 

54 

270 

840 

54 

283 

54 

205 

54 

807 

54 

820 

54 

832 

54 

346 

54 

357 

54 

370 

54 

382 

64 

804 

No. 


■ 


1 


8 


8 


4 


8 


6 


7 


8 


9 


800-349 
















330 ELEMENTARY SOCIAL STATISTICS 

Table 7.—Five-place Common Logarithms of Numbers.— (Continued) 

860-399 


No/ 


0 


1 


2 


8 


4 


6 


6 


7 


8 


9 

S80 

54 

407 

64 

419 

64 

432 

64 

444 

54 

456 

54 

469 

64 

481 

54 

494 

64 

506 

54 

618 

851 

54 

631 

64 

643 

54 

565 

54 

568 

54 

680 

54 

593 

54 

605 

54 

617 

54 

630 

54 

642 

852 

54 

654 

54 

667 

54 

679 

54 

691 

54 

704 

54 

716 

54 

728 

54 

741 

54 

753 

54 

766 

853 

54 

777 

54 

790 

54 

802 

64 

814 

54 

827 

64 

839 

54 

851 

54 

864 

54 

876 

54 

888 

354 

54 

900 

64 

913 

54 

922 

54 

937 

54 

949 

54 

962 

64 

974 

54 

986 

64 

998 

65 

Oil 

855 

55 

023 

66 

036 

63 

047 

66 

060 

65 

072 

65 

084 

65 

096 

63 

108 

53 

121 

63 

133 

856 

55 

145 

66 

167 

55 

169 

65 

182 

53 

194 

55 

206 

63 

218 

63 

230 

65 

242 

65 

255 

357 

55 

267 

65 

279 

55 

291 

55 

303 

55 

315 

55 

328 

65 

340 

55 

352 

55 

364 

55 

376 

858 

55 

388 

65 

400 

55 

413 

53 

425 

55 

437 

55 

449 

65 

461 

65 

473 

65 

485 

55 

497 

359 

55 

500 

56 

522 

55 

534 

53 

546 

55 

558 

55 

570 

66 

582 

65 

594 

65 

606 

55 

618 

860 

65 

630 

*66 

642 

63 

654 

63 

666 

53 

678 

53 

691 

63 

703 

63 

71fi 

66 

727 

63 

739 

361 

55 

761 

66 

763 

55 

775 

63 

787 

55 

799 

55 

811 

65 

823 

53 

836 

55 

847 

56 

859 

862 

55 

871 

55 

883 

53 

805 

66 

907 

55 

919 

55 

931 

55 

943 

55 

956 

53 

967 

55 

979 

363 

55 

991 

66 

003 

56 

015 

66 

027 

56 

038 

56 

060 

66 

062 

56 

074 

56 

086 

56 

098 

864 

56 

110 

66 

122 

66 

134 

66 

146 

56 

168 

56 

170 

56 

182 

56 

194 

66 

205 

56 

217 

365 

56 

229 

66 

241 

66 

253 

66 

265 

56 

277 

56 

289 

66 

301 

66 

312 

56 

324 

66 

336 

866 

56 

348 

56 

360 

66 

372 

56 

384 

66 

396 

66 

407 

56 

419 

56 

431 

66 

443 

56 

455 

867 

56 

467 

56 

478 

56 

490 

56 

602 

56 

614 

56 

626 

66 

538 

56 

649 

56 

661 

56 

673 

368 

56 

586 

56 

697 

56 

608 

56 

620 

56 

632 

56 

644 

56 

656 

56 

667 

66 

679 

56 

691 

369 

56 

703 

66 

714 

56 

726 

56 

738 

56 

750 

56 

761 

56 

773 

56 

786 

56 

797 

56 

808 

870 

66 

820 

66 

832 

66 

844 

66 

853 

56 

867 

66 

879 

66 

891 

66 

902 

66 

914 

56 

926 

871 

56 

937 

56 

949 

66 

961 

56 

972 

56 

984 

66 

996 

57 

008 

67 

019 

57 

031 

57 

043 

872 

67 

054 

67 

066 

67 

078 

67 

089 

67 

101 

67 

113 

67 

124 

67 

136 

67 

148 

67 

159 

373 

67 

171 

67 

183 

67 

194 

67 

206 

67 

217 

67 

229 

67 

241 

57 

252 

67 

264 

67 

276 

874 

67 

287 

67 

299 

67 

310 

67 

322 

67 

334 

57 

343 

67 

357 

67 

368 

57 

380 

67 

392 

375 

67 

403 

67 

416 

67 

426 

67 

438 

67 

449 

67 

461 

67 

473 

67 

484 

67 

496 

67 

607 

876 

67 

619 

67 

630 

67 

642 

67 

653 

67 

665 

67 

676 

57 

588 

67 

600 

67 

611 

67 

623 

877 

67 

634 

57 

646 

67 

657 

67 

669 

67 

680 

57 

692 

67 

703 

57 

716 

67 

726 

67 

738 

378 

67 

749 

67 

761 

67 

772 

67 

784 

67 

795 

67 

807 

67 

818 

57 

830 

67 

841 

67 

852 

879 

67 

864 

67 

876 

.57 

887 

67 

898 

67 

910 

67 

921 

67 

933 

67 

944 

67 

955 

67 

967 

880 

67 

978 

67 

900 

68 

001 

68 

013 

58 

024 

68 

035 

68 

047 

ts 

058 

68 

070 

68 

081 

381 

58 

092 

58 

104 

68 

115 

68 

127 

68 

138 

68 

149 

58 

161 

58 

172 

58 

184 

68 

195 

382 

58 

206 

58 

218 

58 

229 

68 

240 

58 

252 

58 

263 

68 

274 

58 

286 

58 

297 

68 

309 

883 

68 

320 

68 

331 

58 

343 

58 

354 

58 

865 

58 

377 

68 

388 

58 

399 

58 

410 

58 

422 

884 

68 

433 

68 

444 

58 

456 

68 

467 

58 

478 

58 

490 

68 

501 

58 

612 

68 

624 

58 

535 

885 

68 

646 

68 

657 

68 

669 

68 

580 

68 

691 

58 

602 

68 

614 

68 

626 

68 

636 

68 

647 

886 

68 

660 

68 

670 

68 

681 

68 

692 

58 

704 

58 

71fi 

68 

726 

68 

737 

68 

749 

68 

760 

887 

58 

771 

68 

782 

68 

794 

58 

805 

68 

816 

58 

827 

68 

838 

68 

860 

68 

861 

68 

872 

888 

58 

883 

68 

894 

68 

906 

58 

917 

58 

928 

58 

939 

68 

950 

68 

961 

68 

973 

58 

984 

880 

58 

996 

69 

006 

69 

017 

59 

028 

59 

040 

59 

051 

59 

062 

69 

073 

69 

084 

69 

095 

880 

60 

106 

69 

118 

69 

129 

69 

140 

59 

161 

69 

162 

69 

173 

69 

184 

69 

195 

69 

207 

891 

50 

218 

60 

229 

59 

240 

50 

251 

59 

262 

59 

273 

69 

284 

69 

295 

69 

306 

69 

318 

892 

59 

329 

69 

340 

60 

351 

69 

362 

59 

373 

59 

884 

69 

396 

69 

406 

69 

417 

59 

428 

803 

59 

439 

69 

450 

69 

461 

69 

472 

59 

483 

59 

494 

69 

506 

69 

617 

69 

628 

69 

539 

804 

60 

560 

69 

661 

69 

572 

69 

583 

59 

694 

59 

60fi 

69 

616 

69 

627 

69 

638 

69 

649 

805 

60 

660 

69 

671 

69 

682 

69 

693 

69 

704 

69 

71fi 

69 

726 

69 

737 

69 

748 

69 

759 

806 

60 

770 

69 

780 

59 

791 

69 

802 

59 

813 

59 

824 

69 

833 

69 

846 

59 

857 

59 

868 

807 

59 

870 

69 

890 

59 

901 

59 

912 

50 

923 

59 

934 

69 

945 

69 

956 

69 

966 

69 

977 

808 

69 

988 

60 

999 

60 

010 

60 

021 

60 

032 

60 

043 

60 

054 

60 

066 

60 

076 

60 

086 

800 

60 

007 

60 

108 

60 

119 

60 

130 

60 

141 

60 

152 

60 

163 

60 

173 

60 

184 

60 

195 

No. 


0 


1 


2 


8 


■ 

■ 

■ 


6 


7 


8 


9 


860-899 















APPENDIX 


331 


Tablb 7.—Five-place Common Logarithms of Numbers.— {Cmiinued) 

40(M49 


No. 


0 


1 


2 


8 


4 


5 


6 


1 


8 

• 

9 

400 

60 

206 

60 

217 

60 

228 

60 

239 

60 

249 

60 

260 

60 

271 

60 

282 

60 

293 

60 

304 

401 

60 

314 

60 

325 

60 

336 

60 

347 

60 

358 

60 

369 

60 

379 

60 

390 

60 

401 

60 

412 

402 

60 

423 

60 

433 

60 

444 

60 

455 

60 

466 

60 

477 

60 

487 

60 

498 

60 

509 

60 

520 

403 

60 

531 

60 

541 

60 

552 

60 

563 

60 

574 

60 

584 

60 

595 

60 

606 

60 

617 

60 

627 

404 

60 

638 

60 

649 

60 

660 

60 

670 

60 

681 

60 

692 

60 

703 

60 

713 

60 

724 

60 

735 

405 

60 

746 

60 

756 

60 

767 

60 

778 

60 

788 

60 

799 

60 

810 

60 

821 

60 

831 

60 

842 

406 

60 

853 

60 

863 

60 

874 

60 

885 

60 

895 

60 

906 

60 

917 

60 

927 

60 

938 

60 

949 

407 

60 

959 

60 

970 

60 

981 

60 

991 

61 

002 

61 

013 

61 

023 

61 

034 

61 

045 

61 

055 

408 

61 

066 

61 

077 

61 

087 

61 

098 

61 

109 

61 

119 

61 

130 

61 

140 

61 

151 

61 

162 

400 

61 

172 

61 

183 

61 

194 

61 

204 

61 

215 

61 

225 

61 

236 

61 

247 

61 

257 

61 

268 

410 

61 

278 

61 

289 

61 

300 

61 

310 

61 

321 

61 

331 

61 

342 

61 

352 

61 

363 

61 

374 

411 

61 

384 

61 

395 

61 

405 

61 

416 

61 

426 

61 

437 

61 

448 

61 

458 

61 

469 

61 

479 

412 

61 

490 

61 

500 

61 

511 

61 

521 

61 

532 

61 

542 

61 

553 

61 

563 

61 

574 

61 

584 

413 

61 

505 

61 

606 

61 

616 

61 

627 

61 

637 

61 

648 

61 

658 

61 

669 

61 

679 

61 

690 

414 

61 

700 

61 

711 

61 

721 

61 

731 

61 

742 

61 

752 

61 

763 

61 

773 

61 

784 

61 

794 

415 

61 

805 

61 

815 

61 

826 

61 

836 

61 

847 

61 

857 

61 

868 

61 

878 

61 

888 

61 

899 

416 

61 

909 

61 

920 

61 

930 

61 

941 

61 

951 

61 

962 

61 

972 

61 

982 

61 

993 

62 

003 

417 

62 

014 

62 

024 

62 

034 

62 

045 

62 

055 

62 

066 

62 

076 

62 

086 

62 

097 

62 

107 

418 

62 

118 

62 

128 

62 

138 

62 

149 

62 

159 

62 

170 

62 

180 

62 

190 

62 

201 

62 

211 

419 

62 

221 

62 

232 

62 

242 

62 

252 

62 

263 

62 

273 

62 

284 

62 

294 

62 

304 

62 

315 

480 

62 

325 

62 

335 

62 

346 

62 

356 

62 

366 

62 

377 

62 

387 

62 

397 

62 

408 

62 

418 

421 

62 

428 

62 

439 

62 

449 

62 

459 

62 

469 

62 

480 

62 

490 

62 

500 

62 

511 

62 

521 

422 

62 

531 

62 

542 

62 

552 

62 

562 

62 

572 

62 

583 

62 

593 

62 

603 

62 

613 

62 

624 

423 

62 

634 

62 

644 

62 

655 

62 

665 

62 

675 

62 

685 

62 

696 

62 

706 

62 

716 

62 

726 

424 

62 

737 

62 

747 

62 

767 

62 

767 

62 

778 

62 

788 

62 

798 

62 

808 

62 

818 

62 

820 

425 

62 

839 

62 

849 

62 

859 

62 

870 

62 

880 

62 

890 

62 

900 

62 

910 

62 

921 

62 

031 

426 

62 

941 

62 

051 

62 

961 

62 

972 

62 

982 

62 

992 

63 

002 

63 

012 

63 

022 

63 

033 

427 

63 

043 

63 

053 

63 

063 

63 

073 

63 

083 

63 

094 

63 

104 

63 

114 

63 

124 

63 

134 

428 

63 

144 

63 

155 

63 

165 

63 

175 

63 

185 

63 

195 

63 

205 

63 

213 

63 

225 

63 

236 

429 

63 

246 

63 

256 

63 

266 

63 

276 

63 

286 

63 

296 

63 

306 

63 

317 

63 

327 

63 

337 

480 

63 

347 

63 

357 

63 

367 

63 

377 

63 

387 

63 

397 

63 

407 

63 

417 

63 

428 

63 

438 

431 

63 

448 

63 

458 

63 

468 

63 

478 

63 

488 

63 

498 

63 

508 

63 

518 

63 

528 

63 

538 

432 

63 

548 

63 

558 

63 

568 

63 

579 

63 

589 

63 

599 

63 

COO 

63 

619 

63 

629 

63 

639 

433 

63 

649 

63 

659 

63 

669 

63 

679 

63 

689 

63 

699 

63 

709 

63 

719 

63 

729 

63 

739 

434 

63 

749 

63 

759 

63 

769 

63 

779 

63 

789 

63 

799 

63 

809 

63 

819 

63 

829 

63 

839 

435 

63 

849 

63 

859 

63 

869 

63 

879 

63 

889 

63 

899 

63 

909 

63 

919 

63 

929 

63 

939 

436 

63 

949 

63 

959 

63 

969 

63 

979 

63 

988 

63 

998 

64 

008 

64 

018 

64 

028 

64 

038 

437 

64 

048 

64 

058 

64 

068 

64 

078 

64 

088 

64 

098 

64 

108 

64 

118 

64 

128 

64 

137 

438 

64 

147 

64 

157 

64 

167 

64 

177 

64 

187 

64 

197 

64 

207 

64 

217 

64 

227 

64 

237 

439 

64 

246 

64 

256 

64 

266 

64 

276 

64 

286 

64 

296 

64 

306 

64 

316 

64 

326 

64 

335 

440 

64 

345 

64 

355 

64 

366 

64 

875 

64 

385 

64 

395 

64 

404 

64 

414 

64 

424 

64 

434 

441 

64 

444 

64 

454 

64 

464 

64 

473 

64 

483 

64 

493 

64 

503 

64 

513 

64 

523 

64 

532 

442 

64 

542 

64 

552 

64 

562 

64 

572 

64 

582 

64 

591 

64 

601 

64 

611 

64 

621 

64 

631 

443 

64 

640 

64 

650 

64 

660 

64 

670 

64 

680 

64 

689 

64 

699 

64 

709 

64 

719 

64 

729 

444 

64 

738 

64 

748 

64 

758 

64 

768 

64 

777 

64 

787 

64 

797 

64 

807 

64 

816 

64 

820 

445 

64 

836 

64 

846 

64 

856 

64 

865 

64 

875 

64 

885 

64 

895 

64 

904 

64 

914 

64 

924 

446 

64 

933 


943 

64 

953 

64 

963 

64 

972 

64 

982 

64 

992 

65 

002 

65 

oil 

65 

021 

447 

65 

031 

65 

040 

65 

050 

65 

060 

65 

070 

65 

079 

65 

089 

65 

099 

65 

108 

65 

118 

448 

65 

128 

65 

137 

65 

147 

65 

157 

65 

167 

65 

176 

65 

186 

65 

196 

65 

203 

65 

215 

440 

65 

225 

65 

234 

65 

244 

65 

254 

65 

263 

65 

273 

65 

283 

65 

292 

65 

302 

65 

312 

No. 


■ 


1 


2 


8 


■ 

1 

B 


6 


7 


8 


9 


400-t49 











332 ELEMENTARY SOCIAL STATISTICS 

Table 7.—Five-place Common Looasithms op Numbeks.— (Continued) 

46(M99 


No. 0 1 2 8 4 6 6 7 8 » 


460 

65 321 

65 331 

65 341 

65 350 

65 360 

65 369 

65 370 

65 380 

65 308 

65 408 

451 

65 418 

65 427 

65 437 

65 447 

65 456 

65 466 

65 475 

65 485 

65 495 

65 504 

452 

65 514 

65 523 

65 533 

65 543 

65 552 

65 562 

65 671 

65 581 

65 501 

65 600 

453 

65 610 

65 619 

65 629 

65 639 

65 648 

65 658 

65 667 

65 677 

65 686 

65 696 

454 

65 706 

65 715 

65 725 

65 734 

65 744 

65 753 

65 763 

65 772 

65 782 

65 702 

455 

65 801 

65 811 

65 820 

65 830 

65 839 

65 840 

68 858 

65 868 

65 877 

65 887 

456 

65 896 

65 006 

65 916 

65 925 

65 935 

65 044 

65 954 

65 963 

65 973 

65 982 

457 

65 992 

66 001 

66 011 

66 020 

66 030 

66 030 

66 040 

66 058 

66 068 

66 077 

458 

66 087 

66 006 

66 106 

66 115 

66 124 

66 134 

66 143 

66 153 

66 162 

66 172 

459 

66 181 

66 101 

66 200 

66 210 

66 219 

66 220 

66 238 

66 247 

66 257 

66 266 

460 

66 276 

66 285 

66 295 

66 304 

66 314 

66 323 

66 332 

66 842 

66 351 

66 361 

461 

66 370 

66 380 

66 389 

66 398 

66 408 

66 417 

66 427 

66 436 

66 445 

66 455 

462 

66 464 

66 474 

66 483 

66 492 

66 502 

66 511 

66 521 

66 530 

66 530 

66 549 

463 

66 558 

66 567 

66 577 

66 586 

66 596 

66 605 

66 614 

66 624 

66 633 

66 642 

464 

66 652 

66 661 

66 671 

66 680 

66 689 

66 690 

66 708 

66 717 

66 727 

66 736 

465 

66 745 

66 755 

66 764 

66 773 

66 783 

66 792 

66 801 

66 811 

66 820 

66 829 

466 

66 839 

66 848 

66 857 

66 867 

66 876 

66 885 

66 804 

66 004 

66 913 

66 922 

467 

66 932 

66 041 

66 050 

66 960 

66 969 

66 978 

66 987 

66 007 

67 006 

67 015 

468 

67 02fi 

67 034 

67 043 

67 052 

67 062 

67 071 

67 080 

67 080 

67 099 

67 108 

469 

67 117 

67 127 

67 136 

67 145 

67 154 

67 164 

67 173 

67 182 

67 101 

67 201 

470 

67 210 

67 210 

67 228 

67 237 

67 247 

67 256 

67 265 

67 274 

67 284 

67 203 

471 

67 302 

67 311 

67 321 

67 330 

67 339 

67 348 

67 357 

67 367 

67 376 

67 385 

472 

67 394 

67 403 

67 413 

67 422 

67 431 

67 440 

67 449 

67 459 

67 468 

67 477 

473 

67 486 

67 495 

67 504 

67 514 

67 523 

67 532 

67 541 

67 550 

67 660 

67 560 

474 

67 578 

67 587 

67 596 

67 605 

67 614 

67 624 

67 633 

67 642 

67 651 

67 660 

475 

67 669 

67 670 

67 688 

67 697 

67 706 

67 715 

67 724 

67 783 

67 742 

67 752 

476 

67 761 

67 770 

67 770 

67 788 

67 797 

67 806 

67 815 

67 825 

67 834 

67 843 

477 

67 852 

67 861 

67 870 

67 879 

67 888 

67 897 

67 906 

67 916 

67 025 

67 034 

478 

67 943 

67 052 

67 961 

67 970 

67 979 

67 088 

67 997 

68 006 

68 015 

68 024 

479 

68 034 

68 043 

68 052 

68 061 

68 070 

68 070 

68 088 

68 007 

68 106 

68 115 

480 

68 124 

68 133 

68 142 

68 151 

68 160 

68 160 

68 178 

68 187 

68 196 

68 205 

481 

68 215 

68 224 

68 233 

68 242 

68 251 

68 260 

68 260 

68 278 

68 287 

68 296 

482 

68 305 

68 314 

68 323 

68 332 

68 341 

68 350 

68 350 

68 368 

68 377 

68 386 

483 

68 395 

68 404 

68 413 

68 422 

68 431 

68 440 

68 440 

68 458 

68 467 

68 476 

484 

68 485 

68 404 

68 502 

68 511 

68 520 

68 520 

68 538 

68 547 

68 556 

68 565 

485 

68 574 

68 583 

68 592 

68 601 

68 610 

68 610 

68 628 

68 687 

68 646 

68 655 

486 

68 664 

68 673 

68 681 

68 690 

68 699 

68 708 

68 717 

68 726 

68 735 

68 744 

487 

68 753 

68 762 

68 771 

68 780 

68 789 

68 797 

68 806 

68 815 

68 824 

68 833 

488 

68 842 

68 851 

68 860 

68 869 

68 878 

68 886 

68 895 

68 904 

68 913 

68 022 

489 

68 931 

68 040 

68 949 

68 958 

68 966 

68 975 

68 984 

68 093 

69 002 

60 011 

400 

60 020 

60 028 

60 037 

60 046 

60 055 

60 064 

60 078 

60 082 

69 000 

60 000 

491 

60 108 

60 117 

60 126 

60 135 

60 144 

69 152 

60 161 

60 170 

60 170 

60 188 

492 

69 197 

60 205 

60 214 

60 223 

69 232 

69 241 

60 240 

69 258 

69 267 

69 276 

493 

69 285 

69 294 

60 302 

60 311 

69 320 

69 320 

60 338 

69 846 

69 355 

60 364 

494 

69 378 

60 381 

60 300 

60 300 

69 408 

60 417 

60 425 

60 434 

60 443 

60 452 

495 

60 461 

60 460 

60 478 

60 487 

60 406 

60 504 

60 518 

60 522 

60 581 

60 539 

496 

69 548 

69 557 

60 566 

60 574 

69 583 

69 592 

60 601 

69 600 

69 618 

60 627 

497 

69 636 

69 644 

60 653 

60 662 

69 671 

69 670 

60 688 

69 607 

60 705 

60 714 

498 

69 723 

60 732 

60 740 

60 740 

60 758 

69 767 

69 775 

60 784 

60 793 

60 801 

499 

69 810 

69 810 

69 827 

69 836 

69 845 

69 854 

69 862 

60 871 

69 880 

60 888 


Na. 0 1 3 8 4 6 6 7 8 9 


46(M99 












APPENDIX 333 

Table 7.—Five-place Common Logarithms op Numbers.— (Continued) 

600-649 


No. 


0 


1 


2 


3 


■ 

D 


3 

7 

8 


9 


69 

897 

69 

906 

69 

914 

69 

023 

69 

932 

69 

940 

69 

949 

69 958 

69 

960 

69 

975 


69 

984 

69 

992 

70 

001 

70 

010 

70 

018 

70 

027 

70 

036 

70 044 

70 

063 

70 

062 


70 

070 

70 

079 

70 

088 

70 

090 

70 

105 

70 

114 

70 

122 

70 131 

70 

140 

70 

148 


70 

167 

70 

166 

70 

174 

70 

183 

70 

191 

70 

200 

70 

209 

70 217 

70 

220 

70 

234 


70 

243 

70 

262 

70 

260 

70 

260 

70 

278 

70 

280 

70 

295 

70 303 

70 

312 

70 

321 

605 

70 

329 

70 

338 

70 

340 

70 

356 

70 

304 

70 

372 

70 

881 

70 889 

70 

898 

70 

406 

' 600 

70 

416 

70 

424 

70 

432 

70 

441 

70 

449 

70 

458 

70 

467 

70 475 

70 

484 

70 

492 

607 

70 

601 

70 

509 

70 

618 

70 

520 

70 

535 

70 

544 

70 

552 

70 661 

70 

509 

70 

578 

608 

70 

680 

70 

696 

70 

603 

70 

612 

70 

621 

70 

629 

70 

638 

70 646 

70 

655 

70 

663 

609 

70 

672 

70 

680 

70 

689 

70 

697 

70 

700 

70 

714 

70 

723 

70 781 

70 

740 

70 

749 

010 

70 

787 

70 

766 

70 

774 

70 

783 

70 

791 

70 

800 

70 

808 

70 817 

70 

825 

70 

834 

611 

70 

842 

70 

861 

70 

859 

70 

868 

70 

876 

70 

880 

70 

893 

70 902 

70 

910 

70 

919 

612 

70 

927 

70 

935 

70 

944 

70 

952 

70 

961 

70 

969 

70 

978 

70 980 

70 

995 

71 

003 

613 

71 

012 

71 

020 

71 

029 

71 

037 

71 

046 

71 

054 

71 

063 

71 071 

71 

079 

71 

088 

614 

71 

090 

71 

106 

71 

113 

71 

122 

71 

130 

71 

130 

71 

147 

71 165 

71 

164 

71 

172 

616 

71 

181 

71 

189 

71 

198 

71 

200 

71 

214 

71 

223 

71 

231 

71 240 

71 

243 

71 

267 

616 

71 

266 

71 

273 

71 

282 

71 

290 

71 

299 

71 

307 

71 

316 

71 324 

71 

332 

71 

341 

617 

71 

349 

71 

357 

71 

360 

71 

374 

71 

383 

71 

391 

71 

399 

71 408 

71 

416 

71 

425 

618 

71 

433 

71 

441 

71 

460 

71 

458 

71 

460 

71 

475 

71 

483 

71 492 

71 

500 

71 

508 

619 

71 

617 

71 

526 

71 

633 

71 

642 

71 

550 

71 

659 

71 

567 

71 576 

71 

684 

71 

692 

080 

71 

600 

71 

609 

71 

617 

71 

625 

71 

634 

71 

642 

71 

660 

71 659 

71 

667 

71 

675 

621 

71 

684 

71 

692 

71 

700 

71 

709 

71 

717 

71 

726 

71 

734 

71 742 

71 

750 

71 

769 

622 

71 

767 

71 

776 

71 

784 

71 

792 

71 

800 

71 

809 

71 

817 

71 825 

71 

834 

71 

842 

523 

71 

860 

71 

868 

71 

867 

71 

876 

71 

883 

71 

892 

71 

900 

71 908 

71 

917 

71 

925 

624 

71 

933 

71 

941 

71 

060 

71 

958 

71 

960 

71 

976 

71 

983 

71 991 

71 

999 

72 

008 

626 

72 

010 

72 

024 

72 

032 

72 

041 

72 

049 

72 

067 

72 

066 

72 074 

72 

082 

72 

090 

620 

72 

099 

72 

107 

72 

116 

72 

123 

72 

132 

72 

140 

72 

148 

72 160 

72 

165 

72 

173 

627 

72 

181 

72 

189 

72 

198 

72 

200 

72 

214 

72 

222 

72 

230 

72 239 

72 

247 

72 

256 

628 

72 

263 

72 

272 

72 

280 

72 

288 

72 

296 

72 

304 

72 

313 

72 321 

72 

329 

72 

337 

629 

72 

340 

72 

364 

72 

302 

72 

370 

72 

378 

72 

387 

72 

365 

72 403 

72 

411 

72 

419 

080 

72 

428 

72 

430 

72 

444 

72 

452 

72 

460 

72 

469 

72 

477 

72 485 

72 

493 

72 

601 

631 

72 

609 

72 

618 

72 

620 

72 

534 

72 

542 

72 

550 

72 

558 

72 667 

72 

575 

72 

683 

632 

72 

691 

72 

599 

72 

607 

72 

610 

72 

624 

72 

632 

72 

640 

72 648 

72 

656 

72 

665 

633 

72 

673 

72 

081 

72 

689 

72 

697 

72 

706 

72 

713 

72 

722 

72 730 

72 

738 

72 

746 

634 

72 

764 

72 

762 

72 

770 

72 

779 

72 

787 

72 

795 

72 

803 

72 811 

72 

819 

72 

827 

636 

72 

836 

72 

843 

72 

852 

72 

860 

72 

868 

72 

876 

72 

884 

72 892 

72 

900 

72 

908 

630 

72 

910 

72 

926 

72 

933 

72 

941 

72 

949 

72 

967 

72 

965 

72 973 

72 

981 

72 

989 

637 

72 

997 

73 

006 

73 

014 

78 

022 

73 

030 

73 

038 

73 

040 

73 054 

73 

062 

73 

070 

638 

73 

078 

78 

080 

73 

094 

73 

102 

73 

111 

73 

119 

73 

127 

73 135 

73 

143 

73 

151 

639 

73 

169 

73 

167 

73 

176 

73 

183 

73 

191 

73 

199 

73 

207 

73 216 

73 

223 

73 

231 

040 

73 

239 

73 

247 

73 

266 

73 

203 

73 

272 

73 

280 

73 

288 

73 290 

73 

304 

73 

312 

641 

73 

320 

78 

328 

73 

330 

73 

344 

73 

352 

73 

860 

73 

368 

73 870 

73 

384 

73 

392 

642 

73 

400 

73 

408 

73 

410 

73 

424 

73 

432 

73 

440 

73 

448 

73 450 

73 

464 

73 

472 

643 

73 

480 

73 

488 

73 

490 

73 

604 

73 

612 

73 

620 

73 

628 

73 630 

73 

544 

73 

652 

644 

73 

660 

73 

608 

73 

570 

73 

684 

78 

692 

73 

600 

73 

608 

73 616 

73 

624 

73 

632 

646 

73 

640 

73 

848 

78 

660 

78 

604 

73 

672 

73 

679 

73 

687 

78 695 

78 

703 

78 

711 

640 

73 

719 

73 

727 

73 

736 

73 

743 

73 

761 

73 

769 

73 

767 

73 775 

73 

783 

73 

791 

647 

73 

799 

73 

807 

73 

816 

73 

823 

73 

830 

73 

838 

73 

846 

73 854 

73 

862 

73 

870 

648 

73 

878 

73 

886 

73 

894 

73 

902 

73 

910 

73 

918 

73 

926 

73 933 

73 

941 

73 

049 

649 

73 

967 

73 

966 

73 

973 

73 

981 

73 

989 

73 

997 

74 

005 

74 013 

74 

020 

74 

028 

No. 


■ 


1 


2 


3 


B 

■ 

fl 


6 

7 


8 




600-649 
















336 ELEMENTARY SOCIAL STATISTICS 

Table 7.—Five-place Common Looabithms op Numbers.— (CotUinued) 

660-699 


No. 


0 


1 


2 


3 


4 


5 


6 


1 

D 


9 

660 

81 

201 

81 

298 

81 

305 

81 

311 

81 

318 

81 

325 

81 

331 

81 

338 

81 

345 

81 

351 

651 

81 

358 

81 

865 

81 

371 

81 

378 

81 

385 

81 

391 

81 

398 

81 

405 

81 

411 

81 

418 

652 

81 

425 

81 

431 

81 

438 

81 

445 

81 

451 

81 

458 

81 

465 

81 

471 

81 

478 

81 

485 

653 

81 

491 

81 

498 

81 

505 

81 

511 

81 

518 

81 

525 

81 

531 

81 

538 

81 

544 

81 

551 

654 

81 

558 

81 

564 

81 

571 

81 

578 

81 

584 

81 

591 

81 

508 

81 

604 

81 

611 

81 

617 

655 

81 

624 

81 

631 

81 

637 

81 

644 

81 

651 

81 

667 

81 

664 

81 

671 

81 

677 

81 

684 

656 

81 

690 

81 

697 

81 

704 

81 

710 

81 

717 

81 

723 

81 

730 

81 

737 

81 

743 

81 

750 

657 

81 

757 

81 

763 

81 

770 

81 

776 

81 

783 

81 

790 

81 

796 

81 

803 

81 

809 

81 

816 

658 

81 

823 

81 

829 

81 

836 

81 

842 

81 

849 

81 

856 

81 

862 

81 

869 

81 

875 

81 

882 

659 

81 

880 

81 

895 

81 

902 

81 

908 

81 

915 

81 

921 

81 

928 

81 

935 

81 

941 

81 

948 

660 

81 

954 

81 

961 

81 

968 

81 

074 

81 

981 

81 

987 

81 

904 

82 

000 

82 

007 

82 

014 

661 

82 

020 

82 

027 

82 

033 

82 

040 

82 

046 

82 

053 

82 

060 

82 

066 

82 

073 

82 

079 

662 

82 

086 

82 

002 

82 

099 

82 

105 

82 

112 

82 

119 

82 

125 

82 

132 

82 

138 

82 

145 

663 

82 

161 

82 

158 

82 

164 

82 

171 

82 

178 

82 

184 

82 

191 

82 

197 

82 

204 

82 

210 

664 

82 

217 

82 

223 

82 

230 

82 

236 

82 

243 

82 

249 

82 

256 

82 

263 

82 

269 

82 

276 

665 

82 

282 

82 

289 

82 

295 

82 

302 

82 

308 

82 

315 

82 

821 

82 

328 

82 

334 

82 

341 

666 

82 

347 

82 

354 

82 

360 

82 

367 

82 

373 

82 

380 

82 

387 

82 

393 

82 

400 

82 

406 

667 

82 

413 

82 

419 

82 

426 

82 

432 

82 

439 

82 

445 

82 

452 

82 

458 

82 

465 

82 

471 

668 

82 

478 

82 

484 

82 

491 

82 

497 

82 

504 

82 

510 

82 

517 

82 

523 

82 

530 

82 

536 

660 

82 

543 

82 

540 

82 

556 

82 

562 

82 

569 

82 

575 

82 

582 

82 

588 

82 

595 

82 

601 

670 

82 

607 

82 

614 

82 

620 

82 

627 

82 

633 

82 

640 

82 

646 

82 

653 

82 

659 

82 

666 

671 

82 

672 

82 

679 

82 

685 

82 

692 

82 

698 

82 

705 

82 

711 

82 

718 

82 

724 

82 

730 

672 

82 

737 

82 

743 

82 

750 

82 

756 

82 

763 

82 

769 

82 

776 

82 

782 

82 

789 

82 

796 

673 

82 

802 

82 

808 

82 

814 

82 

821 

82 

827 

82 

834 

82 

840 

82 

847 

82 

853 

82 

860 

674 

82 

866 

82 

872 

82 

879 

82 

883 

82 

892 

82 

898 

82 

905 

82 

911 

82 

918 

82 

924 

675 

82 

930 

82 

937 

82 

943 

82 

050 

82 

056 

82 

963 

82 

969 

82 

075 

82 

982 

82 

988 

676 

82 

995 

83 

001 

83 

008 

83 

014 

83 

020 

83 

027 

83 

033 

83 

040 

83 

046 

83 

052 

677 

83 

059 

83 

065 

83 

072 

83 

078 

83 

085 

83 

091 

83 

097 

83 

104 

83 

110 

83 

117 

678 

83 

123 

83 

129 

83 

136 

83 

142 

83 

149 

83 

155 

83 

161 

83 

168 

83 

174 

83 

181 

679 

83 

187 

83 

193 

83 

200 

83 

206 

83 

213 

83 

219 

83 

225 

83 

232 

83 

238 

83 

245 

680 

83 

251 

83 

257 

83 

264 

83 

270 

83 

276 

83 

283 

83 

289 

83 

296 

83 

302 

83 

808 

681 

83 

315 

83 

321 

83 

327 

83 

334 

83 

340 

83 

347 

83 

353 

83 

359 

83 

366 

83 

372 

682 

83 

378 

83 

385 

83 

391 

83 

308 

83 

404 

83 

410 

83 

417 

83 

423 

83 

429 

83 

436 

683 

83 

442 

83 

448 

83 

455 

83 

461 

83 

467 

83 

474 

83 

480 

83 

487 

83 

493 

83 

499 

684 

83 

506 

83 

512 

83 

518 

83 

525 

83 

531 

83 

537 

83 

544 

83 

550 

83 

556 

83 

663 

685 

83 

569 

83 

675 

83 

582 

83 

588 

83 

594 

83 

601 

83 

607 

83 

613 

83 

620 

88 

626 

686 

83 

632 

83 

639 

83 

645 

83 

651 

83 

658 

83 

664 

83 

670 

83 

677 

83 

683 

83 

680 

687 

83 

696 

83 

702 

83 

708 

83 

715 

83 

721 

83 

727 

83 

734 

83 

740 

83 

746 

83 

753 

688 

88 

759 

83 

765 

83 

771 

83 

778 

83 

784 

83 

790 

83 

797 

83 

803 

83 

809 

83 

816 

689 

83 

822 

83 

828 

83 

835 

83 

841 

83 

847 

83 

853 

83 

860 

83 

866 

83 

872 

88 

870 

690 

83 

885 

83 

891 

83 

897 

83 

004 

83 

910 

83 

016 

83 

923 

83 

929 

83 

935 

88 

942 

1 691 

83 

048 

83 

954 

83 

060 

83 

967 

83 

073 

83 

079 

83 

985 

83 

992 

83 

998 

84 

004 

692 

84 

Oil 

84 

017 

84 

023 

84 

029 

84 

036 

84 

042 

84 

048 

84 

055 

84 

061 

84 

067 

693 

84 

073 

84 

080 

84 

086 

84 

002 

84 

098 

84 

105 

84 

111 

84 

117 

84 

123 

84 

130 

1 604 

84 

136 

84 

142 

84 

148 

84 

155 

84 

161 

84 

167 

84 

173 

84 

180 

84 

186 

84 

102 

695 

84 

108 

84 

205 

84 

211 

84 

217 

84 

223 

84 

230 

84 

236 

84 

242 

84 

248 

84 

265 

606 

84 

261 

84 

267 

84 

273 

84 

280 

84 

286 

84 

292 

84 

298 

84 

805 

84 

311 

84 

817 

697 

84 

323 

84 

330 

84 

336 

84 

342 

84 

348 

84 

354 

84 

361 

84 

367 

84 

373 

84 

879 

608 

84 

386 

84 

802 

84 

398 

84 

404 

84 

410 

84 

417 

84 

423 

84 

429 

84 

435 

84 

442 

600 

84 

448 

84 

454 

84 

460 

84 

466 

84 

473 

84 

479 

84 


84 

491 

84 

497 

84 

604 

No. 


0 


1 


2 


3 


4 


5 


6 


7 


8 

1 

9 


660-699 











APPENDIX 337 

Table 7.—Five-place Common Logarithms op K umbers.— {Continued) 

700-749 


No. 


0 


1 


2 


3 


■ 

1 

■ 


6 


7 


8 


9 

700 

84 

510 

84 

516 

84 

522 

84 

528 

84 

535 

84 

541 

84 

547 

84 

563 

84 

559 

84 

ODD 

701 

84 

672 

84 

578 

84 

584 

84 

590 

84 

597 

84 

603 

84 

609 

84 

615 

84 

621 

84 

628 

702 

84 

634 

84 

640 

84 

646 

84 

652 

84 

658 

84 

665 

84 

671 

84 

677 

84 

683 

84 

689 

703 

84 

696 

84 

702 

84 

708 

84 

714 

84 

720 

84 

726 

84 

733 

84 

739 

84 

745 

84 

751 

704 

84 

767 

84 

763 

84 

770 

84 

776 

84 

782 

84 

788 

84 

794 

84 

800 

84 

807 

84 

813 

705 

84 

819 

84 

825 

84 

831 

84 

837 

84 

844 

84 

850 

84 

856 

84 

862 

84 

868 

84 

874 

706 

84 

880 

84 

887 

84 

893 

84 

899 

84 

905 

84 

911 

84 

917 

84 

924 

84 

930 

84 

936 

707 

84 

942 

84 

948 

84 

954 

84 

960 

84 

967 

84 

973 

84 

979 

84 

985 

84 

991 

84 

997 

708 

85 

003 

85 

009 

85 

016 

85 

022 

85 

028 

85 

034 

85 

040 

85 

046 

85 

052 

85 

058 

709 

85 

065 

85 

071 

85 

077 

85 

083 

85 

089 

85 

095 

85 

101 

85 

107 

85 

114 

85 

120 

710 

85 

126 

85 

132 

85 

138 

85 

144 

85 

150 

85 

156 

83 

163 

85 

169 

85 

175 

85 

181 

711 

85 

187 

85 

193 

85 

199 

85 

205 

85 

211 

83 

217 

85 

224 

85 

230 

85 

236 

85 

242 

712 

85 

248 

85 

254 

85 

260 

85 

266 

85 

272 

85 

278 

85 

285 

85 

291 

85 

297 

85 

303 

713 

85 

309 

85 

315 

85 

321 

85 

327 

85 

333 

85 

339 

85 

345 

83 

352 

85 

358 

85 

364 

714 

85 

370 

85 

376 

85 

382 

85 

888 

85 

394 

85 

400 

85 

406 

83 

412 

85 

418 

85 

425 

715 

85 

431 

85 

437 

85 

443 

85 

449 

85 

455 

85 

461 

85 

467 

85 

473 

85 

479 

85 

485 

716 

85 

491 

85 

497 

85 

503 

85 

509 

85 

616 

85 

522 

85 

528 

85 

534 

85 

540 

85 

546 

717 

85 

552 

85 

558 

85 

564 

85 

570 

85 

676 

85 

582 

85 

588 

85 

594 

85 

600 

85 

606 

718 

85 

612 

85 

618 

85 

625 

85 

631 

85 

637 

85 

643 

85 

649 

85 

655 

85 

661 

85 

667 

719 

85 

673 

85 

679 

85 

685 

85 

691 

85 

697 

85 

703 

85 

709 

85 

715 

85 

721 

85 

727 

720 

85 

733 

85 

739 

85 

745 

85 

751 

85 

757 

85 

763 

85 

769 

85 

775 

85 

781 

85 

788 

721 

85 

794 

86 

800 

85 

806 

85 

812 

85 

818 

85 

824 

85 

830 

85 

836 

85 

842 

'85 

848 

722 

85 

854 

85 

860 

85 

866 

85 

872 

85 

878 

85 

884 

85 

890 

85 

806 

85 

902 

85 

908 

723 

85 

914 

85 

920 

85 

926 

85 

932 

85 

938 

85 

944 

85 

950 

85 

956 

85 

962 

85 

968 

724 

85 

974 

85 

980 

85 

986 

85 

992 

85 

998 

86 

004 

86 

010 

86 

016 

86 

022 

86 

028 

726 

86 

034 

86 

040 

86 

046 

86 

052 

86 

058 

86 

064 

86 

070 

86 

076 

86 

082 

86 

088 

726 

86 

094 

86 

100 

86 

106 

86 

112 

86 

118 

86 

124 

86 

130 

85 

136 

86 

141 

86 

147 

727 

86 

153 

86 

159 

86 

165 

86 

171 

86 

177 

86 

183 

86 

189 

86 

195 

86 

201 

86 

207 

728 

86 

213 

86 

219 

86 

225 

86 

231 

86 

237 

86 

243 

86 

249 

86 

255 

86 

261 

86 

267 

729 

86 

273 

86 

279 

86 

285 

86 

291 

86 

297 

86 

303 

86 

308 

86 

314 

86 

320 

86 

326 

7S0 

86 

332 

86 

338 

86 

344 

86 

350 

86 

356 

86 

362 

86 

368 

86 

374 

86 

380 

86 

386 

731 

86 

392 

86 

398 

86 

404 

86 

410 

86 

415 

86 

421 

86 

427 

86 

433 

86 

439 

86 

445 

732 

86 

451 

86 

467 

86 

463 

86 

469 

86 

475 

86 

481 

86 

487 

86 

493 

86 

499 

86 

504 

733 

86 

510 

86 

616 

86 

522 

86 

528 

86 

534 

86 

540 

86 

546 

86 

552 

86 

558 

86 

564 

734 

86 

570 

86 

576 

86 

581 

86 

587 

86 

593 

86 

599 

86 

605 

86 

611 

86 

617 

86 

623 

735 

86 

629 

86 

63fi 

86 

641 

86 

646 

86 

652 

86 

658 

86 

664 

86 

670 

86 

676 

86 

682 

736 

86 

688 

86 

694 

86 

700 

86 

705 

86 

711 

86 

717 

86 

723 

86 

729 

86 

735 

86 

741 

737 

86 

747 

86 

753 

86 

759 

86 

764 

86 

770 

86 

776 

86 

782 

86 

788 

86 

794 

86 

800 

738 

86 

806 

86 

812 

86 

817 

86 

823 

86 

829 

86 

835 

86 

841 

86 

847 

86 

853 

86 

859 

739 

86 

864 

86 

870 

86 

876 

86 

882 

86 

888 

86 

894 

86 

900 

86 

906 

86 

911 

86 

917 

740 

86 

923 

86 

929 

86 

935 

86 

941 

86 

947 

86 

953 

86 

958 

86 

964 

86 

970 

86 

976 

741 

86 

982 

86 

988 

86 

994 

86 

999 

87 

005 

87 

Oil 

87 

017 

87 

023 

87 

029 

87 

035 

742 

87 

040 

87 

046 

87 

052 

87 

058 

87 

064 

87 

070 

87 

075 

87 

081 

87 

087 

87 

093 

743 

87 

099 

87 

105 

87 

111 

87 

116 

87 

122 

87 

128 

87 

134 

87 

140 

87 

146 

87 

151 

744 

87 

167 

87 

163 

87 

169 

87 

175 

87 

181 

87 

186 

87 

192 

87 

198 

87 

204 

87 

210 

745 

87 

216 

87 

221 

87 

227 

87 

233 

87 

239 

87 

245 

87 

251 

87 

256 

87 

262 

87 

268 

746 

87 

274 

87 

280 

87 

286 

87 

291 

87 

297 

87 

303 

87 

309 

87 

315 

87 

320 

87 

326 

747 

87 

332 

87 

338 

87 

344 

87 

349 

87 

355 

87 

361 

87 

367 

87 

373 

87 

379 

87 

384 

748 

87 

390 

87 

396 

87 

402 

87 

408 

87 

413 

87 

419 

87 

425 

87 

431 

87 

437 

87 

442 

749 

87 

448 

87 

454 

87 

460 

87 

466 

87 

471 

87 

477 

87 

483 

87 

489 

87 

495 

87 

500 

No. 

1 

1 


1 


2 


3 


■ 

1 


( 

S 


r 

8 




700-749 












338 


ELEMENTARY SOCIAL STATISTICS 


Table 7.—Five-place Common Logarithms op Numbers.— (Continued) 

760-799 


No. 


0 


1 


2 


B 


4 


5 


6 


7 

8 


9 

750 

87 

606 

87 

612 

87 

618 

87 

623 

87 

529 

87 

536 

87 

641 

87 

547 

87 

552 

87 

658 

751 

87 

564 

87 

670 

87 

676 

87 

681 

87 

687 

87 

593 

87 

599 

87 

604 

87 

610 

87 

616 

752 

87 

622 

87 

628 

87 

633 

87 

639 

87 

645 

87 

651 

87 

656 

87 

662 

87 

668 

87 

674 

753 

87 

679 

87 

686 

87 

691 

87 

697 

87 

703 

87 

708 

87 

714 

87 

720 

87 

726 

87 

731 

754 

87 

737 

87 

743 

87 

749 

87 

764 

87 

760 

87 

766 

87 

772 

87 

777 

87 

783 

87 

780 

765 

87 

795 

87 

800 

87 

806 

87 

812 

87 

818 

87 

823 

87 

829 

87 

835 

87 

841 

87 

846 

755 

87 

852 

87 

858 

87 

864 

87 

869 

87 

875 

87 

881 

87 

887 

87 

892 

87 

898 

87 

004 

767 

87 

910 

87 

916 

87 

921 

87 

927 

87 

033 

87 

938 

87 

044 

87 

050 

87 

056 

87 

061 

758 

87 

967 

87 

973 

87 

978 

87 

684 

87 

990 

87 

096 

88 

001 

88 

007 

88 

013 

88 

018 

759 

88 

024 

88 

030 

88 

036 

88 

041 

88 

047 

88 

053 

88 

058 

88 

064 

88 

070 

88 

076 

760 

88 

081 

88 

087 

88 

093 

88 

098 

88 

104 

88 

110 

88 

116 

88 

121 

*88 

127 

88 

133 

761 

88 

138 

88 

144 

88 

150 

88 

16& 

88 

161 

88 

167 

88 

173 

88 

178 

88 

184 

88 

100 

762 

88 

195 

88 

201 

88 

207 

88 

213 

88 

218 

88 

224 

88 

230 

88 

235 

88 

241 

88 

247 

763 

88 

252 

88 

258 

88 

264 

88 

270 

88 

275 

88 

281 

88 

287 

88 

292 

88 

298 

88 

304 

764 

88 

309 

88 

816 

88 

321 

88 

326 

88 

332 

88 

338 

88 

343 

88 

349 

88 

355 

88 

360 

765 

88 

366 

88 

372 

88 

877 

88 

383 

88 

389 

88 

395 

88 

400 

88 

406 

88 

412 

88 

417 

766 

88 

423 

88 

429 

88 

434 

88 

440 

88 

446 

88 

451 

88 

457 

88 

463 

88 

468 

88 

474 

767 

88 

480 

88 

485 

88 

491 

88 

497 

88 

502 

88 

508 

88 

513 

88 

519 

88 

525 

88 

530 

768 

88 

536 

88 

642 

88 

647 

88 

653 

88 

559 

88 

564 

88 

670 

88 

576 

88 

581 

88 

587 

769 

88 

593 

88 

598 

88 

604 

88 

610 

88 

616 

88 

621 

88 

627 

88 

632 

88 

638 

88 

643 

770 

.88 

649 

88 

655 

88 

660 

88 

666 

88 

672 

88 

677 

88 

683 

88 

689 

88 

604 

88 

700 

771 

88 

706 

88 

711 

88 

717 

88 

722 

88 

728 

88 

734 

88 

739 

88 

745 

88 

750 

88 

756 

772 

88 

762 

88 

767 

88 

773 

88 

779 

88 

784 

88 

790 

88 

796 

88 

801 

88 

807 

88 

812 

773 

88 

818 

88 

824 

88 

820 

88 

835 

88 

840 

88 

846 

88 

852 

88 

857 

88 

863 

88 

868 

774 

88 

874 

88 

880 

88 

886 

88 

891 

88 

897 

88 

902 

88 

908 

88 

013 

88 

919 

88 

025 

776 

88 

930 

88 

936 

88 

941 

88 

947 

88 

653 

88 

958 

88 

064 

88 

069 

88 

975 

88 

981 

776 

88 

986 

88 

992 

88 

997 

89 

003 

89 

009 

89 

014 

89 

020 

89 

026 

80 

031 

80 

037 

777 

89 

042 

89 

048 

89 

053 

89 

059 

89 

064 

89 

070 

80 

076 

89 

081 

89 

087 

80 

002 

778 

89 

098 

89 

104 

89 

109 

89 

llfi 

89 

120 

89 

126 

89 

131 

89 

137 

89 

143 

89 

148 

779 

89 

154 

89 

159 

89 

165 

89 

170 

89 

176 

89 

182 

89 

187 

89 

103 

89 

108 

80 

204 

750 

89 

209 

89 

216 

89 

221 

89 

226 

89 

232 

89 

237 

89 

243 

89 

248 

89 

254 

89 

260 

781 

89 

266 

89 

271 

89 

276 

89 

282 

89 

287 

89 

203 

89 

208 

89 

304 

89 

310 

89 

315 

782 

89 

321 

89 

326 

89 

332 

89 

337 

89 

343 

89 

348 

89 

354 

89 

360 

89 

366 

89 

371 

783 

89 

376 

89 

382 

89 

887 

89 

393 

89 

398 

89 

404 

89 

409 

89 

415 

89 

421 

80 

426 

784 

89 

432 

89 

437 

89 

443 

89 

448 

89 

454 

89 

459 

89 

465 

89 

470 

89 

476 

80 

481 

786 

89 

487 

89 

492 

89 

498 

89 

604 

89 

609 

89 

615 

89 

520 

89 

526 

89 

631 

80 

637 

786 

89 

642 

89 

648 

89 

653 

89 

669 

89 

564 

89 

670 

89 

576 

89 

581 

89 

586 

89 

502 

787 

89 

697 

89 

603 

89 

609 

89 

614 

89 

620 

89 

626 

89 

631 

89 

636 

89 

642 

89 

647 

788 

89 

663 

89 

658^ 

89 

664 

89 

669 

89 

675 

89 

680 

89 

686 

89 

601 

89 

607 

89 

702 

789 

89 

708 

89 

718 

89 

719 

89 

724 

80 

730 

89 

736 

89 

741 

89 

746 

89 

752 

89 

767 

TOO 

89 

763 

89 

768 

89 

774 

89 

770 

89 

785 

89 

790 

89 

706 

89 

801 

89 

807 

89 

812 

791 

89 

818 

89 

823 

89 

829 

89 

834 

89 

840 

89 

846 

89 

861 

89 

856 

80 

862 

89 

867 

792 

89 

878 

89 

878 

89 

883 

89 

889 

89 

894 

89 

900 

89 

906 

89 

Oil 

89 

916 

89 

922 

793 

89 

927 

89 

933 

89 

938 

89 

944 

89 

949 

89 

055 

89 

060 

89 

966 

80 

971 

89 

977 

794 

89 

982 

89 

988 

89 

993 

89 

998 

90 

004 

90 

009 

90 

016 

90 

020 

90 

026 

90 

031 

796 

90 

037 

90 

042 

90 

048 

90 

068 

90 

059 

90 

064 

90 

069 

90 

075 

90 

080 

90 

086 

796 

90 

091 

90 

097 

90 

102 

90 

108 

90 

113 

90 

119 

90 

124 

90 

120 

00 

135 

90 

140 

797 

90 

146 

90 

161 

90 

167 

90 

162 

90 

168 

90 

173 

90 

179 

00 

184 

90 

180 

90 

105 

798 

90 

200 

90 

206 

90 

211 

90 

217 

90 

222 

00 

227 

00 

233 

00 

238 

90 

244 

00 

240 

799 

90 

255 

90 

260 

90 

266 

90 

271 

90 

276 

00 

282 

00 

287 

00 

203 

90 

208 

90 

304 

No. 


■ 


1 


2 


8 


4 


6 


6 


7 


8 


9 


760-799 










APPENDIX 


339 


Table 7.—Five-place Common Logabithms op Numbers.— (Continued) 

800-849 


No. 

r 

0 


1 


2 

3 


4 


5 


6 


7 


B 



800 

90 

300 

00 

314 

90 

320 

90 

325 

90 

331 

00 

336 

00 

342 

90 

347 

00 

852 

00 

358 

801 

90 

363 

90 

360 

90 

374 

90 

380 

90 

385 

00 

300 

00 

396 

90 

401 

00 

407 

90 

412 

802 

00 

417 

00 

423 

00 

428 

00 

434 

90 

439 

00 

445 

00 

450 

90 

455 

00 

461 

90 

466 

803 

90 

472 

90 

477 

00 

482 

00 

488 

90 

403 

00 

400 

90 

504 

90 

509 

90 

615 

00 

520 

804 

00 

526 

90 

531 

90 

536 

90 

542 

90 

547 

00 

553 

90 

558 

90 

563 

90 

669 

90 

574 

805 

00 

580 

00 

585 

00 

500 

00 

506 

90 

601 

00 

607 

90 

612 

90 

617 

00 

623 

00 

628 

806 

00 

634 

90 

630 

90 

644 

00 

650 

00 

655 

00 

660 

00 

666 

90 

671 

00 

677 

00 

682 

807 

90 

687 

00 

603 

90 

608 

00 

703 

90 

709 

90 

714 

90 

720 

90 

725 

90 

730 

00 

736 

808 

00 

741 

90 

747 

00 

752 

00 

757 

90 

763 

00 

768 

00 

773 

90 

770 

00 

784 

90 

789 

809 

90 

79g 

90 

800 

90 

806 

00 

811 

90 

816 

90 

822 

90 

827 

90 

832 

00 

838 

90 

843 

810 

90 

840 

90 

854 

90 

859 

90 

865 

90 

870 

00 

875 

00 

881 

90 

886 

90 

801 

00 

897 

811 

90 

902 

90 

907 

90 

013 

00 

918 

90 

024 

90 

920 

90 

034 

00 

040 

00 

945 

90 

050 

812 

00 

956 

90 

061 

00 

966 

00 

072 

90 

977 

00 

982 

00 

988 

90 

993 

00 

998 

01 

004 

813 

91 

000 

91 

014 

01 

020 

91 

025 

91 

030 

01 

036 

91 

041 

91 

046 

01 

052 

91 

057 

814 

01 

062 

91 

068 

91 

073 

91 

078 

91 

084 

91 

089 

91 

094 

91 

100 

91 

105 

01 

110 

815 

91 

116 

91 

121 

91 

126 

91 

132 

91 

137 

01 

142 

01 

148 

91 

153 

91 

158 

01 

164 

816 

91 

160 

91 

174 

91 

180 

01 

185 

01 

190 

91 

196 

01 

201 

91 

206 

91 

212 

91 

217 

N 817 

01 

222 

91 

228 

91 

233 

01 

238 

01 

243 

01 

240 

01 

254 

91 

259 

01 

265 

91 

270 

818 

91 

275 

91 

281 

91 

286 

91 

291 

91 

297 

91 

302 

01 

307 

91 

312 

91 

318 

91 

323 

810 

91 

328 

01 

334 

91 

339 

01 

344 

91 

350 

91 

355 

91 

860 

91 

865 

91 

871 

91 

376 

880 

01 

381 

01 

387 

91 

392 

01 

897 

91 

403 

91 

408 

91 

413 

91 

418 

91 

424 

91 

420 

821 

01 

434 

01 

440 

01 

445 

01 

450 

91 

455 

91 

461 

01 

466 

91 

471 

91 

477 

91 

482 

822 

01 

487 

91 

492 

91 

408 

91 

503 

91 

508 

91 

514 

91 

519 

91 

524 

91 

529 

91 

535 

823 

01 

540 

91 

645 

01 

551 

01 

556 

91 

561 

91 

566 

91 

572 

91 

677 

91 

582 

01 

587 

824 

91 

503 

91 

598 

91 

603 

91 

609 

91 

614 

01 

610 

91 

624 

91 

680 

91 

635 

91 

640 

825 

01 

645 

91 

651 

91 

656 

91 

661 

91 

666 

01 

672 

91 

677 

91 

682 

91 

687 

01 

693 

826 

01 

608 

91 

703 

91 

709 

01 

714 

91 

719 

01 

724 

91 

730 

91 

735 

01 

740 

01 

745 

827 

01 

751 

01 

756 

91 

761 

01 

766 

91 

772 

91 

777 

91 

782 

91 

787 

01 

793 

01 

798 

828 

91 

803 

91 

808 

91 

814 

91 

819 

91 

824 

91 

820 

01 

834 

91 

840 

01 

845 

01 

850 

829 

91 

853 

91 

801 

91 

866 

91 

871 

91 

876 

91 

882 

91 

887 

91 

892 

91 

897 

91 

903 

880 

01 

908 

91 

913 

91 

018 

01 

024 

91 

929 

01 

934 

91 

039 

91 

044 

91 

050 

91 

055 

831 

01 

060 

91 

965 

01 

971 

91 

978 

91 

981 

01 

086 

91 

991 

91 

997 

92 

002 

92 

007 

832 

92 

012 

02 

018 

02 

023 

02 

028 

92 

033 

92 

038 

02 

044 

92 

049 

92 

054 

02 

059 

833 

02 

065 

92 

070 

92 

075 

92 

080 

92 

085 

92 

091 

92 

096 

92 

101 

92 

106 

02 

111 

834 

92 

117 

02 

122 

92 

127 

92 

132 

92 

137 

02 

143 

92 

148 

92 

158 

92 

158 

92 

163 

835 

02 

160 

92 

174 

92 

179 

02 

184 

92 

189 

02 

195 

02 

200 

92 

205 

92 

210 

02 

215 

836 

02 

221 

92 

226 

02 

231 

92 

236 

92 

241 

02 

247 

92 

252 

92 

257 

92 

262 

02 

267 

837 

02 

273 

92 

278 

02 

283 

02 

288 

02 

293 

02 

208 

02 

804 

92 

309 

92 

314 

92 

319 

838 

02 

824 

92 

330 

02 

335 

92 

340 

92 

345 

02 

350 

92 

855 

92 

361 

92 

366 

02 

371 

839 

92 

376 

92 

381 

02 

387 

92 

802 

92 

397 

02 

402 

92 

407 

02 

412 

92 

418 

92 

423 

840 

92 

428 

02 

433 

92 

438 

02 

448 

92 

449 

92 

454 

92 

459 

02 

464 

92 

469 

02 

474 

841 

92 

480 

92 

485 

92 

490 

92 

495 

92 

500 

02 

505 

92 

511 

92 

516 

92 

521 

92 

526 

842 

02 

531 

92 

536 

92 

542 

02 

547 

92 

552 

02 

557 

92 

562 

92 

667 

92 

572 

02 

578 

843 

92 

583 

92 

688 

92 

593 

02 

508 

92 

603 

92 

609 

02 

614 

92 

619 

92 

624 

02 

620 

844 

92 

634 

02 

639 

92 

645 

92 

ego 

92 

655 

92 

660 

92 

665 

92 

670 

92 

675 

02 

681 

846 

02 

686 

02 


92 

606 

92 

701 

92 

706 

02 

711 

92 

716 

92 

722 

92 

727 

92 

732 

846 

02 

737 

02 

742 

92 

747 

92 

752 

92 

768 

02 

763 

02 

768 

92 

773 

92 

778 

92 

783 

847 

92 

788 

02 

703 

02 

700 

92 

804 

92 

809 

02 

814 

92 

810 

92 

824 

92 

820 

92 

834 

848 

92 

840 

92 

845 

92 

850 

02 

855 

92 

860 1 

92 

865 

02 

870 

02 

875 

02 

881 

92 

886 

840 

02 

801 

92 

806 

02 

901 

92 

006 

02 

Oil 

92 

916 

92 

921 

02 

027 

02 

082 

02 

037 

No. 

1 

0 


1 


2 

8 


1 

1 

1 

1 

6 


7 


8 


0 


m-m 











340 ELEMENTARY SOCIAL STATISTICS 


Tablb 7.—Fivb-placb Common Looabithms of Numbers.— {Continued) 

860-«9d 


No. 


0 


1 


2 


8 


4 


6 


5 


r 

8 

Tl 

850 

92 

942 

92 

947 

02 

952 

92 

057 

02 

062 

92 

067 

02 

078 

02 

978 

02 

083 

02 

988 

851 

02 

993 

92 

998 

03 

008 

93 

008 

03 

013 

93 

018 

03 

024 

03 

029 

03 

034 

98 

039 

852 

93 

044 

03 

049 

93 

054 

03 

050 

03 

064 

93 

060 

03 

075 

93 

080 

03 

085 

03 

000 

853 

03 

005 

93 

100 

93 

105 

03 

110 

93 

115 

93 

120 

98 

125 

03 

131 

03 

136 

93 

141 

854 

93 

146 

93 

151 

08 

156 

03 

161 

03 

166 

93 

171 

98 

176 

08 

181 

03 

186 

93 

192 

855 

03 

197 

98 

202 

03 

207 

03 

212 

93 

217 

93 

222 

03 

227 

03 

232 

03 

237 

93 

242 

850 

03 

247 

93 

252 

03 

258 

03 

263 

93 

268 

93 

273 

93 

278 

93 

283 

93 

288 

93 

293 

857 

03 

298 

93 

303 

03 

808 

03 

313 

93 

318 

93 

323 

93 

328 

93 

834 

03 

839 

93 

344 

868 

93 

349 

93 

354 

03 

859 

93 

364 

03 

869 

93 

374 

93 

879 

93 

384 

93 

889 

93 

394 

859 

93 

399 

93 

404 

98 

409 

93 

414 

93 

420 

93 

425 

93 

430 

93 

435 

03 

440 

93 

445 

860 

08 

450 

93 

455 

08 

460 

03 

465 

93 

470 

93 

475 

98 

480 

03 

485 

98 

400 

93 

405 

861 

93 

500 

93 

505 

98 

510 

94 

515 

93 

520 

03 

526 

98 

531 

93 

536 

98 

541 

93 

546 

862 

93 

551 

03 

556 

03 

561 

93 

566 

93 

571 

98 

576 

93 

581 

03 

586 

93 

591 

93 

506 

863 

93 

601 

03 

606 

93 

611 

93 

616 

93 

621 

93 

626 

93 

631 

03 

636 

93 

641 

93 

646 

864 

93 

651 

03 

650 

93 

661 

03 

666 

93 

671 

93 

676 

03 

682 

03 

687 

03 

692 

93 

697 

865 

03 

702 

03 

707 

03 

712 

03 

717 

03 

722 

03 

727 

08 

732 

03 

737 

03 

742 

03 

747 

866 

03 

752 

93 

757 

93 

762 

03 

767 

03 

772 

03 

777 

03 

782 

03 

787 

03 

792 

93 

707 

867 

03 

802 

03 

807 

03 

812 

93 

817 

03 

822 

03 

827 

03 

832 

93 

837 

03 

842 

93 

847 

868 

93 

852 

03 

857 

03 

862 

03 

867 

03 

872 

03 

877 

03 

882 

03 

887 

03 

892 

93 

897* 

869 

03 

002 

03 

907 

03 

912 

93 

917 

93 

022 

03 

027 

93 

032 

93 

037 

93 

042 

93 

947 

870 

93 

052 

93 

967 

03 

062 

03 

967 

93 

972 

03 

077 

93 

982 

93 

987 

03 

002 

93 

007 

871 

04 

002 

94 

007 

94 

012 

94 

017 

04 

022 

04 

027 

04 

032 

94 

037 

94 

042 

94 

047 

872 

04 

062 

94 

057 

04 

062 

04 

067 

04 

072 

04 

077 

94 

082 

04 

086 

04 

001 

04 

006 

873 

94 

101 

94 

106 

94 

111 

94 

116 

04 

121 

04 

126 

94 

131 

04 

136 

04 

141 

04 

146 

874 

94 

151 

94 

156 

94 

161 

04 

166 

94 

171 

04 

176 

94 

181 

94 

186 

04 

191 

04 

196 

875 

94 

201 

94 

206 

94 

211 

94 

216 

04 

221 

94 

226 

04 

231 

94 

236 

04 

240 

04 

245 

876 

04 

250 

04 

255 

04 

260 

94 

265 

94 

270 

94 

275 

04 

280 

94 

285 

04 

290 

94 

295 

877 

04 

800 

04 

305 

94 

810 

94 

315 

94 

320 

94 

825 

94 

830 

94 

835 

94 

340 

94 

845 

878 

94 

349 

94 

854 

94 

359 

94 

864 

94 

369 

94 

374 

94 

379 

94 

384 

94 

889 

94 

394 

879 

04 

309 

94 

404 

94 

400 

94 

414 

94 

419 

94 

424 

94 

429 

94 

433 

04 

488 

94 

443 

880 

94 

448 

94 

458 

04 

458 

04 

468 

04 

468 

94 

478 

94 

478 

04 

483 

04 

488 

04 

493 

881 

04 

408 

04 

503 

94 

507 

94 

512 

04 

517 

94 

522 

94 

527 

04 

532 

94 

537 

04 

542 

882 

04 

547 

94 

552 

94 

657 

94 

562 

04 

667 

94 

571 

04 

576 

04 

581 

04 

586 

94 

501 

883 

94 

506 

04 

601 

94 

606 

94 

611 

04 

616 

94 

621 

04 

626 

04 

630 

94 

635 

94 

640 

884 

94 

645 

94 

650 

94 

655 

94 

660 

04 

665 

94 

670 

04 

675 

04 

680 

04 

685 

94 

680 

885 

04 

694 

94 

699 

94 

704 

04 

709 

04 

714 

94 

719 

04 

724 

94 

720 

04 

734 

04 

788 

886 

04 

743 

04 

748 

04 

758 

04 

758 

94 

763 

94 

768 

94 

773 

04 

778 

94 

783 

04 

787 

887 

94 

792 

04 

797 

94 

802 

04 

807 

94 

812 

94 

817 

94 

822 

94 

827 

94 

832 

94. 

836 

888 

94 

841 

94 

846 

94 

851 

94 

856 

94 

861 

04 

866 

94 

871 

94 

876 

94 

880 

94 

885 

889 

04 

890 

94 

895 

94 

900 

94 

905 

94 

910 

94 

915 

94 

919 

04 

024 

94 

029 

94 

934 

810 

04 

039 

94 

044 

94 

940 

04 

954 

94 

959 

94 

068 

94 

968 

04 

973 

94 

078 

94 

983 

801 

04 

088 

04 

008 

04 

998 

95 

002 

95 

007 

95 

012 

05 

017 

95 

022 

95 

027 

95 

082 

802 

05 

036 

05 

041 

95 

046 

95 

051 

05 

056 

95 

061 

95 

066 

05 

071 

05 

075 

05 

080 

803 

95 

085 

95 

000 

95 

095 

95 

100 

05 

105 

05 

109 

05 

114 

05 

110 

05 

124 

96 

129 

894 

05 

184 

95 

139 

95 

148 

05 

148 

05 

153 

95 

158 

95 

168 

05 

168 

05 

178 

95 

177 

805 

05 

182 

05 

187 

95 

102 

05 

197 

05 

202 

05 

207 

05 

211 

05 

216 

05 

221 

05 

226 

806 

95 

231 

95 

286 

95 

240 

05 

245 

95 

250 

95 

255 

05 

260 

05 

265 

05 

270 

05 

274 

897 

05 

270 

05 

284 

05 

289 

05 

294 

95 

209 

95 

808 

05 

808 

06 

813 

05 

818 

06 

328 

808 

05 

828 

05 

332 

05 

887 

95 

842 

95 

847 

95 

362 

05 

857 

95 

361 

05 

866 

95 

871 

809 

95 

876 

95 

381 

95 

886 

05 

800 

95 

895 

95 

400 

05 

400 

05 

410 

05 

415 

05 

410 

No. 


0 


1 


2 


8 


4 

E 

5 


6 


7 

1 

B 

1 

9 











APPENDIX 341 

Table 7.—Five-place Common Logarithms of Nombers.— (Continued) 

900-949 


No. 


0 


1 


2 


8 


4 


5 


6 


7 


8 


0 

900 

08 

424 

05 

429 

05 

434 

05 

430 

06 

444 

05 

448 

05 

453 

05 

468 

95 

463 

05 

468 

001 

06 

472 

95 

477 

95 

482 

05 

487 

05 

492 

05 

407 

05 

501 

95 

506 

05 

511 

05 

516 

002 

05 

521 

05 

525 

95 

530 

05 

535 

95 

540 

05 

545 

05 

650 

05 

664 

05 

559 

05 

564 

003 

05 

569 

05 

574 

95 

578 

05 

683 

05 

688 

05 

503 

05 

598 

05 

602 

05 

607 

05 

612 

004 

06 

617 

08 

622 

05 

626 

95 

631 

05 

636 

05 

641 

05 

646 

05 

650 

95 

655 

05 

660 

003 

05 

665 

08 

670 

95 

674 

95 

679 

05 

684 

05 

680 

95 

694 

05 

608 

05 

703 

05 

708 

006 

05 

713 

95 

718 

05 

722 

05 

727 

05 

732 

95 

737 

05 

742 

95 

746 

95 

751 

05 

756 

007 

05 

761 

05 

766 

05 

770 

05 

775 

05 

780 

95 

785 

05 

789 

05 

704 

05 

709 

05 

804 

008 

05 

809 

05 

813 

05 

818 

05 

823 

05 

828 

05 

832 

05 

837 

95 

842 

95 

847 

95 

852 

009 

05 

856 

98 

861 

05 

866 

05 

871 

05 

875 

05 

880 

05 

885 

95 

800 

95 

895 

05 

800 

910 

05 

004 

05 

009 

05 

914 

05 

018 

05 

923 

05 

028 

95 

038 

05 

038 

05 

942 

05 

047 

Oil 

05 

052 

05 

057 

05 

061 

05 

066 

05 

071 

05 

076 

05 

080 

05 

085 

05 

000 

05 

905 

012 

05 

090 

06 

004 

06 

009 

06 

014 

06 

010 

96 

023 

06 

028 

06 

033 

06 

038 

06 

042 

013 

06 

v047 

00 

052 

96 

057 

96 

061 

06 

066 

96 

071 

06 

076 

06 

080 

06 

085 

96 

000 

014 

06 

005 

96 

009 

06 

104 

06 

109 

96 

114 

96 

118 

06 

123 

06 

128 

06 

133 

06 

137 

015 

06 

142 

06 

147 

06 

152 

06 

156 

06 

161 

96 

166 

06 

171 

06 

175 

96 

180 

96 

185 

016 

06 

100 

06 

194 

06 

109 

06 

204 

06 

209 

96 

213 

06 

218 

96 

223 

96 

227 

06 

232 

017 

06 

237 

06 

242 

06 

246 

06 

251 

06 

256 

96 

261 

06 

265 

96 

270 

06 

275 

06 

280 

018 

06 

284 

06 

289 

06 

204 

96 

298 

96 

303 

96 

308 

06 

313 

06 

317 

06 

322 

06 

327 

010 

06 

332 

96 

336 

96 

341 

06 

346 

96 

350 

96 

855 

06 

860 

96 

365 

06 

369 

96 

374 

920 

06 

379 

06 

384 

96 

388 

96 

803 

96 

308 

06 

402 

06 

407 

06 

412 

06 

417 

06 

421 

021 

06 

426 

06 

431 

06 

435 

06 

440 

96 

445 

06 

450 

06 

454 

96 

450 

06 

464 

06 

468 

022 

06 

473 

06 

478 

96 

483 

06 

487 

96 

402 

96 

407 

06 

501 

06 

506 

06 

511 

06 

515 

023 

06 

520 

06 

525 

96 

530 

06 

534 

96 

539 

06 

544 

96 

548 

96 

563 

06 

558 

06 

562 

024 

06 

567 

06 

572 

06 

577 

06 

581 

96 

586 

96 

591 

06 

505 

06 

600 

96 

005 

96 

600 

025 

06 

614 

00 

619 

06 

624 

06 

628 

06 

633 

06 

638 

06 

642 

06 

647 

06 

652 

06 

656 

026 

06 

661 

06 

666 

06 

670 

06 

675 

06 

680 

06 

685 

06 

689 

96 

604 

06 

609 

06 

703 

027 

06 

708 

06 

713 

06 

717 

06 

722 

96 

727 

96 

731 

06 

736 

96 

741 

96 

745 

96 

750 

028 

06 

756 

06 

759 

06 

764 

06 

760 

96 

774 

06 

778 

06 

783 

96 

788 

96 

702 

96 

707 

020 

06 

802 

06 

806 

96 

811 

06 

816 

96 

820 

06 

825 

06 

830 

06 

834 

06 

839 

06 

844 

930 

06 

848 

06 

853 

96 

858 

06 

862 

96 

867 

06 

872 

06 

876 

96 

881 

06 

886 

06 

800 

031 

06 

895 

06 

000 

06 

904 

06 

009 

96 

914 

96 

018 

06 

023 

96 

928 

96 

032 

06 

037 

032 

06 

942 

06 

046 

06 

951 

06 

056 

96 

060 

96 

065 

96 

070 

06 

974 

96 

079 

96 

084 

033 

06 

088 

06 

093 

06 

097 

07 

002 

07 

007 

07 

oil 

97 

016 

97 

021 

97 

025 

07 

030 

034 

07 

035 

07 

039 

07 

044 

07 

049 

07 

053 

07 

058 

07 

063 

07 

067 

07 

072 

97 

077 

035 

97 

081 

07 

086 

07 

090 

07 

095 

07 

100 

07 

104 

97 

109 

07 

114 

97 

118 

07 

123 

036 

07 

128 

07 

132 

07 

137 

97 

142 

97 

146 

07 

151 

07 

155 

07 

160 

07 

165 

97 

169 

037 

07 

174 

07 

170 

97 

183 

97 

188 

97 

192 

07 

107 

07 

202 

97 

206 

07 

211 

97 

216 

038 

97 

220 

97 

225 

97 

230 

97 

234 

07 

239 

97 

243 

07 

248 

07 

253 

07 

257 

07 

262 

030 

97 

207 

07 

271 

07 

276 

07 

280 

07 

285 

07 

200 

07 

294 

07 

200 

07 

304 

07 

808 

940 

07 

818 

07 

817 

07 

822 

07 

827 

07 

831 

07 

336 

07 

840 

07 

845 

07 

850 

97 

854 

041 

07 

859 

07 

864 

07 

868 

97 

373 

07 

877 

07 

382 

07 

387 

07 

891 

97 

806 

07 

400 

042 

07 

405 

97 

410 

97 

414 

97 

410 

07 

424 

07 

428 

07 

433 

07 

437 

07 

442 

07 

447 

043 

07 

451 

07 

456 

97 

460 

97 

46fi 

07 

470 

97 

474 

07 

470 

07 

483 

07 

488 

07 

403 

044 

07 

497 

07 

502 

07 

506 

07 

511 

07 

516 

07 

520 

07 

52& 

07 

529 

07 

534 

07 

689 

048 

07 

648 

07 

548 

^07 

552 

07 

567 

07 

662 

07 

566 

07 

571 

07 

676 

97 

580 

97 

685 

046 

07 

589 

07 

594 

*^97 

508 

07 

608 

07 

607 

07 

612 

07 

617 

07 

621 

07 

626 

97 

680 

047 

07 

635 

07 

640 

07 

644 

07 

640 

07 

653 

07 

658 

07 

663 

97 

667 

07 

672 

07 

676 

048 

97 

681 

97 

685 

07 

600 

07 

695 

07 

690 

07 

704 

07 

708 

07 

718 

07 

717 

07 

722 

040 

07 

727 

07 

731 

97 

736 

07 

740 

07 

745 

07 

740 

07 

754 

07 

759 

07 

768 

97 

768 



0 


1 


3 


8 


□ 

L 

5 

1 

6 


7 

8 


9 


900-949 









342 


ELEMENTARY SOCIAL STATISTICS 


Tablb 7.—^Fivib-placb Common Logarithms of Numbers.— (ContiniLed) 

960-1000 


No. 


0 


1 


2 


8 


4 


5 


6 


7 


8 



Hi 

97 

772 

97 

777 

97 

782 

07 

786 

97 

791 

97 

705 

97 

800 

07 

804 

97 

809 

97 

813 

951 

97 

818 

97 

823 

97 

827 

97 

832 

97 

836 

97 

841 

97 

845 

97 

850 

97 

855 

07 

859 

962 

97 

864 

97 

868 

97 

873 

07 

877 

97 

882 

97 

886 

07 

891 

97 

806 

97 

000 

97 

003 

958 

97 

909 

97 

914 

07 

918 

07 

923 

97 

028 

97 

032 

97 

937 

97 

041 

97 

046 

07 

050 

964 

97 

965 

97 

069 

97 

064 

97 

068 

97 

973 

97 

978 

07 

082 

07 

087 

97 

901 

97 

996 

955 

98 

000 

98 

005 

08 

009 

08 

014 

98 

019 

08 

023 

98 

028 

08 

032 

98 

037 

08 

041 

956 

98 

046 

08 

050 

08 

055 

08 

059 

98 

064 

98 

068 

08 

073 

08 

078 

08 

082 

08 

087 

957 

98 

091 

98 

096 

08 

100 

08 

105 

98 

109 

98 

114 

08 

118 

98 

123 

98 

127 

98 

132 

958 

98 

137 

98 

141 

98 

146 

98 

160 

98 

155 

08 

150 

98 

164 

08 

168 

98 

173 

08 

177 

959 

98 

182 

08 

186 

08 

101 

08 

105 

98 

200 

98 

204 

98 

209 

98 

214 

98 

218 

98 

223 

960 

98 

227 

08 

232 

08 

236 

08 

241 

- 

08 

245 

08 

250 

08 

254 

08 

250 

98 

263 

08 

268 

961 

98 

272 

08 

277 

08 

281 

98 

286 

08 

290 

98 

205 

08 

200 

98 

304 

98 

308 

98 

313 

962 

98 

318 

98 

822 

08 

327 

98 

831 

98 

336 

98 

840 

08 

845 

98 

349 

98 

354 

08 

358 

963 

98 

363 

98 

867 

08 

372 

08 

376 

98 

381 

08 

885 

08 

800 

08 

304 

98 

300 

08 

403 

964 

98 

408 

08 

412 

08 

417 

08 

421 

08 

426 

08 

430 

08 

435 

98 

439 

08 

444 

08 

448 

965 

98 

453 

08 

467 

08 

462 

08 

466 

98 

471 

98 

475 

08 

480 

08 

484 

08 

489 

08 

493 

966 

98 

408 

08 

502 

08 

507 

98 

511 

98 

516 

08 

520 

08 

525 

98 

520 

08 

534 

98 

538 

967 

98 

543 

08 

547 

98 

552 

98 

556 

98 

561 

98 

565 

08 

570 

98 

574 

98 

579 

98 

583 

968 

98 

588 

08 

592 

08 

597 

08 

601 

98 

605 

08 

610 

08 

614 

08 

619 

08 

623 

98 

628 

969 

98 

632 

98 

637 

98 

641 

98 

646 

08 

650 

98 

655 

98 

650 

98 

664 

98 

668 

08 

673 

970 

98 

677 

08 

682 

08 

686 

08 

601 

08 

695 

08 

700 

08 

704 

98 

700 

98 

718 

98 

717 

971 

98 

722 

08 

726 

98 

731 

08 

735 

98 

740 

08 

744 

08 

749 

08 

753 

08 

758 

98 

762 

972 

98 

767 

08 

771 

98 

776 

08 

780 

98 

784 

08 

789 

08 

703 

98 

798 

08 

802 

98 

807 

973 

08 

811 

98 

816 

08 

820 

08 

825 

98 

829 

08 

834 

08 

838 

98 

843 

98 

847 

08 

861 

974 

98 

866 

08 

860 

98 

865 

98 

860 

98 

874 

08 

878 

08 

883 

08 

887 

98 

802 

08 

806 

975 

08 

900 

98 

005 

98 

900 

98 

914 

98 

018 

08 

023 

98 

927 

98 

932 

98 

036 

98 

041 

976 

98 

945 

98 

049 

98 

054 

08 

068 

98 

963 

98 

067 

98 

972 

98 

076 

98 

081 

98 

985 

977 

98 

080 

98 

994 

98 

098 

90 

003 

99 

007 

90 

012 

09 

016 

99 

021 

99 

026 

99 

029 

978 

09 

034 

99 

038 

90 

043 

99 

047 

99 

052 

00 

056 

09 

061 

00 

065 

09 

060 

99 

074 

979 

99 

078 

99 

083 

99 

087 

09 

002 

99 

006 

90 

100 

99 

105 

99 

109 

99 

114 

99 

118 

980 

99 

128 

90 

127 

90 

131 

09 

136 

09 

140 

99 

145 

09 

140 

00 

154 

90 

158 

90 

162 

981 

90 

167 

90 

171 

09 

176 

99 

180 

99 

185 

99 

189 

09 

193 

09 

108 

99 

202 

00 

207 

982 

09 

211 

90 

216 

99 

220 

09 

224 

99 

229 

99 

233 

09 

238 

99 

242 

09 

247 

99 

251 

983 

99 

265 

99 

260 

09 

264 

09 

269 

99 

273 

99 

277 

99 

282 

99 

286 

99 

291 

99 

295 

984 

90 

800 

90 

804 

99 

808 

99 

318 

99 

317 

99 

822 

99 

826 

09 

830 

99 

885 

90 

339 

985 

99 

344 

00 

848 

09 

352 

00 

367 

99 

861 

99 

806 

90 

370 

99 

874 

99 

870 

99 

883 

986 

09 

388 

09 

892 

99 

896 

09 

401 

99 

405 

99 

410 

09 

414 

99 

410 

99 

428 

99 

427 

987 

09 

432 

00 

436 

99 

441 

00 

445 

99 

449 

99 

454 

90 

458 

99 

468 

90 

467 

90 

471 

988 

90 

476 

00 

480 

09 

484 

00 

489 

09 

493 

99 

408 

00 

502 

90 

506 

90 

511 

09 

515 

988 

00 

520 

00 

624 

09 

628 

90 

688 

00 

637 

99 

642 

99 

646 

99 

550 

00 

555 

09 

659 

990 

09 

564 

99 

568 

00 

673 

00 

677 

00 

681 

90 

685 

00 

590 

90 

504 

00 

500 

00 

603 

991 

09 

607 

00 

612 

90 

616 

99 

621 

99 

625 

99 

620 

90 

634 

99 

688 

00 

642 

99 

647 

992 

00 

651 

00 

656 

09 

660 

00 

664 

90 

669 

99 

678 

00 

677 

90 

682 

00 

686 

00 

691 

993 

09 

605 

09 

600 

09 

704 

90 

708 

99 

712 

09 

717 

00 

721 

09 

726 

00 

780 

99 

734 

994 

99 

739 

99 

748 

09 

747 

00 

762 

99 

756 

99 

760 

90 

765 

99 

769 

00 

774 

99 

778 

996 

09 

782 

90 

787 

00 

791 

90 

795 

99 

800 

90 

804 

00 

808 

00 

818 

00 

817 

00 

822 

998 

99 

826 

99 

830 

09 

835 

99 

839 

00 

843 

09 

848 

00 

862 

00 

866 

00 

861 

99 

865 

997 

99 

870 

99 

874 

09 

878 

99 

883 

99 

887 

99 

891 

99 

806 

99 

000 

09 

004 

90 

909 

998 

09 

918 

09 

917 

99 

022 

09 

926 

99 

030 

09 

035 

99 

039 

99 

944 

09 

048 

00 

952 

999 

09 

067 

99 

061 

00 

965 

09 

970 

99 

074 

90 

978 

90 

988 

99 

987 

09 

091 

00 

996 

1000 

00 

000 

00 

004 

00 

009 

00 

018 

00 

017 

00 

022 

00 

026 

00 

080 

00 

085 

00 

039 

No. 


0 


1 


2 


8 


□ 


5 


6 


7 


8 


0 


AKA 4AAA 















Index 


A 

Accuracy, testing a statistical 8ched> 
ule for, 42 

Actuarial method, 24^25 
Alienation, coefficient of, 182, 190, 
193 

Analysis of statistical data, 50 
Arithmetic mean, 99 

{See also Mean, arithmetic) 
Arkin, Herbert, 50 
Array, frequency, 60-61, 66 
Attribute, 28, 231, 234 
Average, need for, 94 

representativeness of, 108, 130- 
131 

Average deviation, 122-124 
{See also Deviation, mean) 
Averages, 94 

B 

Bar charts, 88-89 

Barr, A. S., C. V. Good, and D. E. 
Scates, 30 

Baten, W. D., 170, 254 

Bernard, L. L., 9 

Bernoulli sample, 224 

Beta, measure of kurtosis, 168 

Bias, 32 

Bimodal, 95 

Binet, Stanford-, intelligence test, 12, 
19 

Binomial coefficients, 305 
Binomial distribution, 151-156 
asymmetrical (skewed), 156 
formulas for, 151, 152 
. mean of, formida for the, 155 
standard deviation of, formula for 
the, 155, 234 


Binomial distribution, universe, 233 
Biserial correlation, 199-203 

{See also CJorrelation, biserial) 
Bowley, A. L., 51, 53 
Brown, Lyndon O., 55 
Burgess, E. W., and L. J. Cottrell, 
20 

Burtt, E. A., 11, 30 
C 

Camp, B. H., 170, 195 
Campbell, N. R., 23 
Caption of a frequency table, 71 
Cardinal number, 15 
Cards, machine tabulating, 48-49 
Causal system, sampling a, 229 
Causes, search for, 26 
Census, United States Bureau of the, 
3, 33 

Census of Agriculture, U. S., 1935, 
34 

definition of a “farm,” 34-36 
other definitions, 36 
Chaddock, R. E., 30, 75, 121, 142, 
195, 297 

Changing universe, 222 
Chapin, F. Stuart, 12, 20, 23, 30, 55 
Charlier check, 127 
Chi-square, x*i 304 
substitute for standard error of 
coefficient of contingency, 217 
test applied to a contingency 
table, 148-149, 205-206, 208 
to a fourfold table, 209 
used to test significance of differ¬ 
ences between two frequency 
distributions, 269-272 
Class intervals, 61 
selection of, 64-68 


343 



344 


ELEMENTARY SOCIAL STATISTICS 


Class limits, continuous variable, 69 
discrete variable, 69 
Classes, 61 
Classification, 10 
principles of, 69 
reliability of, 197-198 
Coding, 129 

use of, in computing measures of 
dispersion and partition, 129 
Coefficient of alienation, in linear 
correlation, formulas for, 182, 
183, 190 

Coefficient of contingency, 203-208 
Chi-square as substitute for 
standard error, 217 
computation of, 204-206 
correction for broad grouping, 207 
formulas for, 206 
interpretation of, 208 
sign of, 208 
standard error of, 217 
tabular arrangement for, 204 
Coefficient of correlation, r 4 , for 
fourfold tables, 211 
standard error of, 217 
Coefficient of linear correlation, r, 
grouped data, formulas for, 
185 

significance of, 257-258 
significance of the difference 
between two r's, 268-269 
values of the correlation coeffi¬ 
cient for different levels of 
significance, 306 
values of z for given values of r, 
307-308 

ungrouped data, formulas for, 
181, 183, 185 
meaning of, 182-184 
size of sample, 182 
Coefficient of regression, linear cor¬ 
relation, 180 

Coefficient of variation, 129-131 
(See also Variation, coefficient 
of) 

Combinations, 143-144 
formula for, 144 


Comparable measures (scores, 
scales), 136-139 
percentiles, 137-138 
Q scores, 137 
standard scores, 136 
Concomitant variation, 26 
Confidence limits, 248-249 
(See also Fiducial limits) 
Contingency, coefficient of, 203-208 
(See also Coefficient of contin¬ 
gency)^ 

Contingency table, 25 
Continuous variable, 28, 62 
Control group, 26 
Cooperative definition, 44-45 
Coordinates of a point, 82, 172 
Correlation, biserial, 199-203 
formula for rbi*, 201 
rbk compared with r, 203 
scatter diagram, 200 
sign of rbis, 203 
standard error of rw*, 217 
table, 201 

contingency, 203-208 

(See also Coefficient of con¬ 
tingency) 

in fourfold tables, 208-217 

(See also Yule^s Q, Coefficient 
of correlation, r 4 , for four¬ 
fold tables; Tetrachoric 
correlation) 
nonquantitative, 197 
biserial, 199-^3 

(See also Biserial correla¬ 
tion; Correlation, bi¬ 
serial) 

choice of method, 198-199 
coefficient of contingency, 203- 
208 

(See also Coefficient of 
contingency; Correlation 
contingency) 

r 4 , for fourfold tables, 211 
tetrachoric correlation, 211-217 
Yule^s Q, 210, 213 
rank, 191 
formula for, 191 



INDEX 


345 


Correlation, simple linear quantita¬ 
tive, 171-196 

grouped data, correlation table 
and its explanation, 186-189 
formula for coefficient of 
alienation, 190 
formula for r, 185 
formula for standard error of 
estimate, 190 

formula for K-intercept, 190 
formulas for regression coeffi¬ 
cient, 190 

imgrouped data, coefficient of 
alienation. A;, 182-183 
coefficient of correlation, r, 
measuring amount of cor¬ 
relation, 180-184 
correlation due to a single 
case, 172 

does not extend beyond data, 
173-174 

formulas for r, 181-183 
goodness of fit and standard 
error of estimate, 177-180 
line of regression, 175-180, 
184, 185 

negative, 174r-175 
normal equations, 175-176 
positive, 174 

regression coefficient, 180 
scatter diagram, 171-174 
tetrachoric, 211-217 

{See also Tetrachoric correla¬ 
tion) 

between time series, 286-288 
Cottrell, L. J., and E. W. Burgess, 20 
Counting, 10 

Cowden, D. J., and F. E. Croxton, 
23, 93, 121, 142, 170, 182, 195, 
254, 297 

Critical ratio, 258 
Crosshatching, 90-91 
Croxton, F. E., and D. J. Cowden, 
23, 93, 121, 142, 170, 182, 195, 
254, 297 

Culver, Dorothy C., 34 
Cumulative frequency curve (ogive), 
79-81 


Curve, of error, 157 

(See also Normal curve) 
of probabilities, 157 

{See also Normal curve) 
Cycles, correlation between, in two 
series, 286-288 
short-term, 283-286 
short-term, freed from seasonal 
fluctuations, 295-296 
in time series, 286 

D 

Dampier-Whetham, W. C. D., 9 
Davenport, C. B., and M. P. 
Ekas, 217, 220 

Davies, G. R., and Dale Yoder, 
142, 195, 297 
Deciles, 131, 134 
Definition, 10, 44-45 
Degrees of freedom, 148-149 
Delta, A, 95 

Deviation, mean or average, 122-124 
formula for, 123 
measures of, 122-142 
from an average, 122 
use of coding in computation, 
129 

quartile, 135-136 

{See also Quartile deviation) 
standard, <r, 124-129 

computation, grouped data, long 
method, 127 
short method, 127-128 
ungrouped data, long method, 
126 

short method, 126 
formula for, combined distribu¬ 
tions, 128-129 
Sheppard^s correction, 128 
ungrouped and grouped data, 
124-125 
Dewey, John, 30 
Dichotomy, 24, 208 
Differences, between any two statis¬ 
tics, 259-260 

significance of sampling, 255-275 



346 


ELEMENTARY SOCIAL STATISTICS 


Differences, between statistics from 
more than two samples, 272-273 
Discrete aggregate, 18 
Discrete variable, 62 
Dispersion (see Deviation) 
Distribution, 232 
sampling, 232 

(See also Frequency distribu¬ 
tion) 

Districts, 243 

standard errors of sampling, 243- 
244, 246 

Durost, W. N., and Helen M. 
Walker, 75 

E 

Editing the statistical schedule, 47 
Ekas, M. P., and C. B. Davenport, 
217, 220 

Elderton, W. P., 220 
Elmer, M. C., 65 
Empirical standard error, 232 
Equally likely events, 149 
Error, accumulative, 52-63 
curve of, 157 

(See also Normal curve) 
of observation (record), 50-54 
probable, 161, 232 

(See also Probable error) 
in a ratio, 53 
relative, 52 

standard, 161, 217, 232-249 
(See also Standard error) 
Errors, biased, 50, 53 
unbiased (compensating), 52 
Event, 145, 221, 243-244, 246 
Existent universe, 222 
Expected value, 221 
Experimental group, 26 
Exponent, 109 

Ezekiel, Mordecai, 29, 182, 183, 195 
F 

Factor control, 24 
Failure (unsuccessful event), 149, 
222 


Farm, definition of a, U. S. Census 
of Agriculture, 1935, 33-36 
Federal agencies as sources of 
statistical data, 33 
Fiducial limits, 248-249 

(See also Confidence limits) 
Final test, 28 
Fine, H. B., 170 
Fisher, R. A., 30, 182, 195, 306 
Fourfold tables, correlation in, 208-- 
217 

Fourth moment, 165 
Freedom, degrees of, 148, 149 
(See also Degrees of freedom) 
Frequencies, 60 
Frequency, 235-239 
standard error of simple sampling 
of a, 235-239 

of stratified sampling of a, 237 
Frequency array, 60-61, 66 
Frequency distribution, 60-69, 71- 
72, 107-108, 269-272 
continuous variable, tabulation of, 
68-69 

discrete variable, tabulation of, 
60-68 

rules of table form, 71-72 
shapes of, 107-108 
significance of the difference be¬ 
tween two or more, 269-272 
Frequency distributions, nonquanti- 
tative variable, tabulation of, 
69-70 

Frequency polygon, 76-79 
Fry, C. Luther, 55 
Fundamental interval,” in social 
measurement, 19 

G 

Qu index of skewness, 165 
formula for, 165 
significance of, 258-259 
(See also Skewness) 
index of kurtosis, 165 
formula for, 165 
significance of, 258-259 
(See also Kurtosis) 



INDEX 


347 


Galton, Sir Francis, 3 
Garrett, H. E., 75, 142, 195 
Gaussian curve, 157 

(See aUo Normal curve) 
Geometric mean, 109-113 
applied to population growth, 
111-113 

formulas for, 109-110 
Gevorkiantz, S. R., and B. D. 

Mudgett, 231 
Giddings, F. H., 9 
Good, C. V., A. S. Barr, and D. E. 
Scates, 30 

Goodness of fit of regression line, 
177-178 

Goulden, 0. H., 30 
Graphs, 76 
maps, 90-91 
misuse of, 86 
pictographs, 90-91 
pie chart, 89 

steepness of a line, meaning of, 116 
three-dimension, 89 

(See also Bar charts; Cumula¬ 
tive curve (ogive); Histo¬ 
gram; Lorenz curve; Polygon; 
Population growth graphs; 
Semilogarithmic graph; 
Smoothed curve) 

Gross reproduction rate, 116-117 
Grouping errors, 128 
Groups of events, 243 
standard errors of sampling, 243- 
244, 246 

Guilford, J. P., 18 
H 

Heterogeneous universe, 223, 246 
Histogram, 76-79 

Holzinger, Karl J., 18, 170, 217, 220 
Homogeneous universe, 223 
Hooton, A. E., 219 
Horst, Paul, 137 
Hypothesis, 32 
null, 154 

Hypothetical universe, 222, 225 


I 

Independent events, 260 
Index, 15, 16, 44, 45 
Individual, the, and statistics, 7 
Infinite universe, 222 
Instructions accompanying a sta¬ 
tistical schedule, 39-40 
Intangibles, measurement of, 18-20 
Intercept on the Y axis, 175, 190 
Interfering variables, 29 
Interpretation of statistical results, 7 
Interquartile range, 136 
graph of, 136 

Interviewer, the statistical, 42, 47 
J 

J-type distribution, 107-108 
Jocher, Katherine, and Howard W. 

Odum, 55 
Johnson, H. M., 23 
Judges, use of, in social measure¬ 
ment, 15, 16 

K 

Karsten, K. G., 93 
Kelley, T. L., 142 

Kendall, M. G., and G. U. Yule, 
24, 75, 121, 170, 175, 182, 196, 
220, 254 
King, W. I., 105 
Kirkpatrick, Clifford, 23 
Kuhlman, A. F., 34 
Kurtosis, 165-168 
formula for, 165 
gtf index of, 165 

L 

Laboratory sciences, 5 
Leptokurtic, 165 

Less-than cumulative frequency 
curve (ogive), 79-81 
Levels of significance, 256-257 
Lexis sample, 231 



348 


ELEMENTARY SOCIAL STATISTICS 


Limited universe, 222 
correction of standard error for, 
242-243 

Lindquist, E. F., 142, 195 
Line of regression, 175-180 

{See dUo Regression, line of) 
Linear correlation, 171-196 
{See dUo Correlation) 
Logarithms, 323 
five-place, 323-342 
Lorenz curve, 82-83 
Lundberg, G. A., 9, 23, 55 

M 

McCormick, Thomas C., 50, 212 
Maps, 90-91 

Marriage, predicting success or 
failure in, 20 
Matching, 26 

Mathematical statistics, 3, 5 
Mean, arithmetic, 100 
characteristics and interpretation, 
104-109 

definition of, 100 

grouped data, equal classes, short 
method, 101-103 
long methdd, 100 
unequal classes, short method, 
103-104 

significance of the difference be¬ 
tween two means, 264-266 
standard error of simple sampling 
of the, 239-240 

of stratified sampling of the, 240 
of two distributions combined, 63, 
104 

ungrouped data, 99 
weighted, 63, 104 
Mean, geometric, 109-113 
{See also Geometric mean) 

Mean deviation, 122-124 

{See also Deviation, mean) 
Mean probability, 230 
Measurement, of amount, 11 
rules of, 21-22 

Mechanical method, statistics not a, 

8 


Mechanical tabulation of statistical 
data, 48-50 
Median, 97 

characteristics and interpretation 
of, 104-109 
definition of, 97 
grouped data, 97-99 
ungrouped data, 96-97 
Merrill, Maud A., and Lewis M. 

Terman, 23 
Merton, R. K., 16 
Mesokurtic, 165 
Mid-point, 62-64 

Mills, F. C., 9, 75, 142, 196, 254, 
283, 297 
Mode, 94-95 
bimodal distribution, 95 
characteristics and interpretation, 
104-109 
definition, 94 
formula for, 95 
Moments, 165-166 
Mu, /i, 165 

Mudgett, B. D., 75, 93 
and S. R. Gevorkiantz, 231 
Mutually exclusive events, 146 

N 

National Unemployment Census of 
1937, 39-42 

Negative correlation, 174-175 
Net reproduction rate, 116-117 
Nonquantitative methods, role of, 31 
Nonquantitative variable, defined, 
69 

tabulation of, 69-70 
Normal distribution (curve), 156- 
163 

approximation of symmetrical bi¬ 
nomial, 156-157 
areas and ordinates of, 299-303 
calculation of ordinates of, 158 
formulas for, 157 
graphs of, 156 
table showing a, 159 
use in determining probabilities, 
160-163 



INDEX 


349 


Normal equations, straight line, 
175-176 

Normalization, 137 

Nu, 1 ^, 165 

Null hypothesis, 154 

O 

Odum, Howard W., and Katherine 
Jocher, 55 
Ogburn, W. F., 9 
Ogive, 79-81 
Ordered data, 11 
Ordinal number, 15 
Ordinate, 158 
Origins of statistics, 3 

P 

Palmer, Vivien M., 55 
Parameter, 221 
definition of, 221, 231 
Parent, synonym for universe, 221 
Partition values, 131-136 
decile (see Decile) 
median (see Median) 
percentile (see Percentile) 
quartile (see Quartile) 

Pearson, Karl, 3, 217 
Percentile, 131-134, 136 
formula for, 133 
Percentile rank, 134-136 
formula for, 135 
Permutations, 143-144 
formula for, 143 

Peters, C. C., and W. R. Van Voor- 
his, 30, 207, 217, 220, 254 
Pictographs, 90-91 
Pie chart, 89 
Platykurtic, 165 

Poisson (stratified) sample, 224, 
230-231, 234 

Polygon, frequency, 76-79 
Population,' synonym for universe, 
221 

Population growth, 82, 111 
estimates of, 111-113 
graphs of, 82-87 


Population rates, 114-117 
gross reproduction rate, 115-117 
meaning of, 114-116 
net reproduction rate, 115-117 
standard error of, 244-246 
Positive correlation, 174 
Prediction of a mean vs. individual 
values, 250 
Pretest, 28 

Primary statistical data, 37 
Probabilities, curve of, 157 
(See also Normal curve) 
Probability, 145-151 
addition theorem, 146 
definition of, 145 
mean, 230 

product theorem, 147 
of r successes in n trials, formula 
for, 150 

Probable error, 161, 232 
Problem in statistical inquiry, 31 
Proportion, 238 

standard error of simple sampling 
of, 238-239 

of stratified sampling of, 239 
Proportional sample, 230 
Proportions, 266 

significance of the difference be¬ 
tween two, 266-268 
Punching machine, 49 

Q 

Q, Yule^s coefficient of correlation 
for fourfold tables, 210 
Q scores, 137 
Qualitative data, 197 
Quality, 4 

Quantification of social data, 10-23 
Quantity, 4 

Quartile deviation, 135-136 
formula for, 136 
Quartiles, 131-134, 136-137 
Questionnaire, 37 
Quetelet, 3 

R 

Random, 224 
Random sample, 224-225 



360 


ELEMENTARY SOCIAL STATISTICS 


Random sampling munbers, 226- 
228 

Randomization, principle of, 27-28 
Range, 60 

Rank correlation, 191-192 
formula for, 191 
in time series analysis, 287 
Ranking, 11 
Rates, 109-110, 113 
Rating, 11, 45 
Ratio, 53, 109 
Recurrent universe, 222 
Regression coefficient, linear cor¬ 
relation, 180, 190 

Regression equations, linear cor¬ 
relation, 175-176 
error in predicting a mean vs. 

individual values, 250 
formula for, when r is known 
184-185 

formulas for, 175, 176 
geometric meaning of, 175 
goodness of fit and standard error 
of estimate, 177-180 
normal equations, 175-176 
use of, for prediction, 179 
Relationship (gross) between two 
factors: nonquantitative cor¬ 
relation, 197 

(See also Correlation, nonquan¬ 
titative) 

Relationship (gross) between two 
factors: simple linear quanti¬ 
tative correlation, 171-196 
(See also Correlation, linear) 
Reliability, 20, 42-43 
Repeated trials, 151 
Replication, 27 
Representative data, 6 
sample, 246 

Representativeness of an average, 
108, 130-131 
of a sample, 250-252 
Rice, Stuart A., 9 
Richardson, C. H., 18,170 
Rider, P. R., 148 


Root, mean square-, deviation, 124- 
129 

(See also Deviation, standard) 
Rounding off,*' 53 
Ruling of a frequency table, 72 

8 

Sample, 6 
Bernoulli, 224 
large, 234 
Lexis, 231 

Poisson (stratified), 224, 230-231, 
234 

proportional, 230 
random, 224r-228, 255-256 
representative, 246, 250-252 
simple, 224-226, 229-231, 234, 256 
size of, in relation to standard 
error, 234, 237 

stratified (Poisson), 224, 230-231, 
234 

taking the, 224-232 
Sampling, 221, 224r-232 
confidence (fiducial) limits, 248 
by groups of events, 228-229 
general theory of, 232-234 
random sampling numbers, 226- 
228 

unit of, 243 

Sampling differences, 255-275 
(^ee also Significance) 

Sampling distribution, 232 
Sampling errors, 234 
simple sampling errors applied to 
random and stratified sam¬ 
ples, 234 

(See also Standard error) 

Scale, the, 14 

Chapin’s socioeconomic, 12, 20 
graphic rating, 14 
Thurstone’s attitude, 15-17 
Scates, Douglas, 23, 30 
Scatter diagram, 171-174, 188 
Schedule, editing, 48 
the statistical, 37-40 
testing, 42-47 
Scores, 12 



INDEX 


351 


Scoring, 12 

Seasonal fluctuations in time series, 
283-295 

Second moment, 165 
Secondary statistical data, 33-36 
Secular tread, 277-283 
Semi-interquartile range, 135-136 
(See aUo Quartile deviation) 
Semilogarithmic paper, 84-85, 88 
Sheppard’s correction, 128 
Sigma, Z, (T, 99, 124 
Significance of a correlation coeffi¬ 
cient, 257-258 

of the difference between any two 
correlated statistics, 259-260 
of the difference between any two 
independent statistics, 260 
of the difference between the 

combined mean of two simple 
samples from the same uni¬ 
verse and the mean of either 
one of the samples, 266 
of the difference between the 

means of two samples sup¬ 
posed to be simple samples 
from the same universe, 264- 

265 

of the difference between the 

means of two simple samples 
from different universes, 265- 

266 

of the difference between statis¬ 
tics from more than two 

samples, 272-273 
of the difference between two cor¬ 
related means, 261-263 
of the difference between two cor¬ 
relation coefficients, 268-269 
'‘of the difference between two 

independent means, 263-264 
of the difference between two or 
more frequency distributions, 
269-272 

of the diffei^nce between two 

proportions, 266-268 
of and g%, 258-259 
levels of, 256-259 
meaning of tests of, 255-257 


Significance of sampling differences, 
255-275 
of a sum, 269 

Significant figures, number of, 53 
Simple sample, 224-226, 229-231 
error of sampling applied to 
random and stratified sam¬ 
ples, 234 

Simple sampling, 269 
test of the hypothesis of, 269 
Simplicity the statistical ideal, 8 
Size of sample, 234, 237, 246-249 
Skewed frequency distribution, 107 
binomial, 156 
formulas for, 164, 165 
geometric mean of, 110 
graphs of, 107, 164 
meaning of the standard deviation 
or standard error of, 161 
representativeness of averages of, 
107, 108 

table showing a, 164 

(See also gu index of skewness) 
Slope of line, 175 
Smith, James G., 9, 170 
Smoothed frequencies or curve, 79 
Snedecor, G. W., 29 
Social sciences, 4, 5, 6 
Social statistics, 3 

Socioeconomic status, Chapin’s scale 
for measuring, 12 
Sociological journals, 32 
Sorenson, H., 75, 142 
Sorting machine, electric, 49-50 
Squares and square roots, 309-322 
Standard deviation, <r, 124-129 
(See also Deviation, standard) 
Standard error, 161 
of arithmetic mean, 239-240 
controlled by size of sample, 
246-249 

corrected for limited universe, 
242-243 

of a frequency, 235-237 
of a population rate, 244-246 
in predicting a mean vs. individual 
values from a regression equa¬ 
tion, 250 



352 


ELEMENTARY SOCIAL STATISTICS 


Standard error of a proportion, 233- 
239 

of biserial r, 217 

of coefficient of contingency, C, 
217 

empirical, 232 
of standard deviation, 241 
stratified or Poisson sampling, 
of arithmetic mean, 240 
of a frequency, 237 
of a proportion, 239 
of tetrachoric r, 217 
theoretical, 232 

when unit of sampling is a group 
of events or a district, 243- 
244, 246 
of Yule's Q, 217 

Standard error of estimate, linear 
correlation, 177-180 
formulas for, 178, 190 
meaning of, 179 
Standard scores, 136 
Stanford-Binet intelligence test, 12, 
19 

Statistic, definition of, 221 
true, 136 

Statistics, and the individual, 7 
the method of probabilities, 4 
origins of, 3 
social, 3 

Statistics not a mechanical method, 

8 

Steepness of a line graph, meaning 
of, 116 

Straight-line relationship, 19, 277- 
281, 291 

Stratified sample, 224 
errors of simple sampling applied 
to, 234 
universe, 246 

Stub of a frequency table, 71 
Success, i.e., successful event, 149, 
222 

Sum, significance of a, 269 
Summation, 99 

Symmetrical frequency distribution, 
graph of, 106 


Symmetrical frequency distribution, 
representativeness of average 
of, 106-108 
S 3 rmonds, P. M., 18 

T 

Tables, caption, 71 
rules of form for frequency, 71-72 
ruling, 72 
statistical, 41-42 
stub, 71 
title, 71 

Tabulation of frequency distribu¬ 
tions, hand methods, 59-75 
of statistical data, mechanical 
methods, 48-50 

Tabulating machine, electric, 50 
Tallying, 60 

Terman, Lewis M., and Maud A. 

Merrill, 23 
Test, final, 28 

Tetrachoric correlation, 211-217 
computing diagrams for, 215-217 
formulas for, 212 
standard error of, 217 
Theoretical standard error, 232 
Thermometer, 18, 19 
Third moment, 165 
Thorndike, E. L., 45 
Three-dimension graphs, 89 
Thurstone, L. L., attitude scale, 
15-17 

computing diagrams for the tetra¬ 
choric correlation coefficient, 
215-216 

Fundamentals of Statistics,” 196 
Time series, analysis, 276 
correlation between short-term 
cycles of two time series, 
286-288 

graphs of, 82-87, 277 
seasonal fluctuations, 288-296 
secular trend, a moving average, 
281-283 

straight line, 277-281 
short-term cycles, 283-286 
freed from seasonal fluctuations, 
295-296 



INDEX 


353 


Tippett, L. H. C., 170 
random sampling numbers, 226- 
228, 254 

Title of a frequency table, 71 
Transcription sheet, statistical, 41 
Treloar, A. E., 148, 170, 254 
Trend, 277 
secidar, 277-283 

U 

Unit of sampling, 221, 243-244, 246 
Units, equality of, in social measure¬ 
ment, 12, 14, 15, 18, 19, 21, 59 
Universe, 136, 221 
binomial, 233 
changing, 222 
existent, 222, 226 
heterogeneous, 223, 229, 246 
homogeneous, 223, 229, 234 
hypothetical, 222, 225-226, 229- 
230, 234 

infinite, 222, 228-229 
limited, 222, 226, 228, 242-243 
mixed, 246 
recurrent, 222 
stratified, 246 
unique, 222 
Unordered data, 10 

V 

Validity, 20, 42-46 

Van Voorhis, W. R., and C. C. 

Peters, 30, 207, 217, 220, 254 
Variable, 62, 231 
continuous, 28, 62 
discrete, 62 

Variables, interfering, 28 
Variance, 128 


Variation, coefficient of, 129-131 
for comparing variation, 131 
formulas of, 130 

as a measure of the representa¬ 
tiveness of an average, 130- 
131 

need for, 129-130 


Walker, Helen M., 9 
and W. N. Durost, 75 
Waugh, A. E., 297 
Weighted arithmetic mean, 63, 104 
Weighting, 12 
Whelpton, P. K., 32 
White, R. C., 75, 142, 196, 297 
Wolf, A., 30 

X 

X axis, 77 

X* {see Chi-square) 

Y 

Y axis, 77 

F-intercept, 175, 190 
Young, Pauline V., 55 
Yule, G. U., and M. G. Kendall, 24, 
75, 121, 170, 175, 182, 196, 
220, 254 

Yule's Q, coefficient of correlation 
for fourfold tables, 210 
standard error of, 217 

Z 

Z, values of, for given values of r, 
307-308 

Zero point on scale, 12, 14,15, 19, 22 





